文章摘要
薛欢欢,赵瑞雪,寇远涛,鲜国建.农业中文期刊论文信息自动识别与抽取模型构建及实现[J].情报工程,2019,5(6):046-056
农业中文期刊论文信息自动识别与抽取模型构建及实现
Construction and Implementation of Automatic Identification and Extraction Model for Agricultural Chinese Journals
  
DOI:10.3772/j.issn.2095-915X.2019.06.004
中文关键词: 信息抽取;条件随机场;GROBID;农业期刊论文信息
英文关键词: Information extraction; conditional random field; GROBID; information of agricultural journal paper
基金项目:中国农业科学院科技创新工程 “ 语义知识发现系统建设与应用”(CAAS-ASTIP-2016-AII)。
作者单位
薛欢欢 中国农业科学院农业信息研究所 
赵瑞雪 中国农业科学院农业信息研究所 
寇远涛 中国农业科学院农业信息研究所 
鲜国建 中国农业科学院农业信息研究所 
摘要点击次数: 1995
全文下载次数: 1475
中文摘要:
      面对农业领域丰富的中文期刊论文资源,为实现对农业中文期刊论文文本信息的高效利用,识别与抽取论文中信息已成为一种非常迫切的需求。通过对现有论文信息识别与抽取方法及工具进行调研,确定基于条件随机场算法以及GROBID 工具进行农业中文期刊论文信息的识别与抽取。本文构建了农业中文期刊论文信息识别与抽取级联模型,并通过数据采集、文本预处理、特征选择、序列标注、特征模板以及模型训练及评估等一系列流程对模型进行实现与应用。实验结果表明,在进行农业中文期刊论文信息识别与抽取时,该模型在论文头信息以及引文信息抽取方面具有较好的效果,在章节标题以及段落信息的识别上仍然存在不足。
英文摘要:
      The growing Chinese journal articles in the agricultural field has become the rich resourcos forming. In order to realize the efficient use of the text information of agricultural Chinese journal articles, it has become a very urgent need to identify and extract information from papers. Through the investigation of existing paper information identification and extraction methods and tools, the identification and extraction of agricultural Chinese journal papers based on conditional random field algorithm and GROBID tool are developed. In this paper, the information recognition and extraction cascade model of agricultural Chinese journals is constructed, and the model is realized and applied through a series of processes such as data acquisition, text preprocessing, feature selection, sequence labeling, feature template and model training and evaluation. The experimental results show that in the information identification and extraction of agricultural Chinese journal articles, the model has a good effect on the paper head information and citation information extraction, and there are still some shortcomings in the chapter title and paragraph information identification.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮