分布式语义信息集成是语义Web面临的六大挑战之一.本体映射是语义集成的关键.文章基于贝叶斯决策理论提出最小风险的本体映射模型:RiMOM(Risk Minimization based Ontology Mapping).RiMOM将映射发现问题转换成风险最小化问题,提供了一个多策略的本体映射方法.该方法不仅在1∶1的映射上取得了较好的效果,还实现了n∶1映射.实验表明在几个公开的数据集上,RiMOM可以取得比同类方法更高的查准率和查全率.
各种媒体每天有大量的新闻报道产生,需要一种自动化的分析方法将新闻以一种更加清晰的组织形式展示给用户.大多已有工作将新闻划分成平面的话题,然而一个话题并非仅仅是简单的新闻集合,而是由一系列相互关联的事件所组成的.由于话题内的事件之间往往非常相似,导致话题内的事件检测精确度较差.为了克服以上问题,提出了基于事件词元委员会的事件检测与关系发现方法.即首先挖掘每个事件的核心词元,随后利用事件的核心词元进行事件检测与关系发现.在Linguistic Data Consortium(LDC)的两个数据集上的实验结果显示,提出的事件检测与关系发现方法可以显著地改善已有方法的效果.
Keyword extraction is an important research topic of information retrieval. This paper gave the specification of keywords in Chinese news documents based on analyzing linguistic characteristics of news documents and then proposed a new keyword extraction method based on tf/idf with multi-strategies. The approach selected candidate keywords of uni-, hi- and tri-grams, and then defines the features according to their morphological characters and context information. Moreover, the paper proposed several strategies to amend the incomplete words gotten from the word segmentation and found unknown potential keywords in news documents. Experimental results show that our proposed method can significantly outperform the baseline method. We also applied it to retrospective event detection. Experimental results show that the accuracy and efficiency of news retrospective event detection can be significantly improved.