董微,杨代庆.面向学术资源集成的真值发现算法[J].情报工程,2017,3(1):066-071 |
面向学术资源集成的真值发现算法 |
Academic Resource Integration Oriented Truth Discovery Algorithm |
|
DOI:10.3772/j.issn.2095-915X.2017.01.007 |
中文关键词: 资源建设,元数据集成,冲突数据,真值发现 |
英文关键词: Resources construction, metadata integration, conflicting data, truth discovery |
基金项目:NSTL专项基金项目:开放学术资源建设(2016XM16) |
作者 | 单位 | 董微 | 中国科学技术信息研究所 | 杨代庆 | 中国科学技术信息研究所 |
|
摘要点击次数: 3286 |
全文下载次数: 3134 |
中文摘要: |
在构建多渠道元数据资源建设体系时,往往存在着大量的元数据冲突的问题,即对同一对象的属性存在多种描述,造成了元数据的组织与揭示的困难。本文处理的原则是遵从原文,优先选取原文的值作为唯一的真值,将数据冲突问题视为单真值冲突问题。考虑到数据提供商均需要加工数据,将数据源之间的关系视为相互独立。根据以上,本文提出了一种面向学术资源集成的真值发现算法。该算法基于贝叶斯算法,考虑了有关联关系的属性。实验证明本文方法所构造的真值发现算法在保证准确率的同时,大大节省了人力的工作时间。 |
英文摘要: |
Metadata resources construction in constructing multi-channel system, there are often a lot of metadata conflict problem, namely, there are many description of the same object attribute, which lead to difficulties for organization and reveal of the metadata. In this paper, the principle of treatment was to follow the original document, preferred to select the value of the original as the only true value, and took the data conflict as a single true value conflict. Considering the data provider all need data processing, this study took the data source as independent to each other. According to above, this paper proposed an algorithm of the true discovery for integration of academic resources. The algorithm was based on the Bayesian algorithm,considering the relationship between the related properties. Experiments showed that the method can main the accuracy of truth discovery and greatly reduced manpower work time. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |