李善青.一种用于科技项目查重的数据整合及描述模型[J].情报工程,2017,3(5):053-059 |
一种用于科技项目查重的数据整合及描述模型 |
A Data Model of Integration and Representation for Similar Scientific Projects Detection |
DOI:10.3772/j.issn.2095-915X.2017.05.007 |
中文关键词: 数据整合,描述模型,科技项目查重,Hadoop 架构 |
英文关键词: Data integration, project representation model, similar scientific project detection, Hadoop architecture |
基金项目:本文受国家自然科学基金“大数据挖掘在科技项目查重中的应用研究”(编号:71303223)的资助。 |
摘要点击次数: 2691 |
全文下载次数: 1773 |
中文摘要: |
整合科技项目所产出成果的信息能间接反映项目的研究内容,可以弥补项目查重过程中申报书难以获取的不足,具有重要的研究意义。本文提出一种整合科技项目相关产出信息的数据模型。该模型通过整合项目产出的科技报告、学术论文和科技成果等信息,抽取其中的关键词、标题和摘要等对项目的研究内容进行准确的描述,并强化了项目负责人和承担机构等辅助信息对项目查重的重要性,从而为解决项目查重问题提供客观的数据支撑。 |
英文摘要: |
Information integration of research project outputs which are closely related to research contents can represent the research content of a project without the project proposal. This indirect description method is of important research value for the similar project detection. This paper proposed a data integration model of research project outputs, which precisely represented the research content of a project with keywords, titles and abstracts extracted from its published reports, papers and achievements. The information of principle investigator and research organization was also introduced and applied to reinforce the similarity calculation. This model will provide data support and lay the foundation for similar project detection. |
查看/发表评论 下载PDF阅读器 |
关闭 |