李小乐,王玉琢,章成志.针对特定任务的方法实体评估研究[J].情报工程,2021,7(4):013-026 |
针对特定任务的方法实体评估研究 |
Evaluation of Method Entities for a Special Task |
|
DOI:10.3772/j.issn.2095-915X.2021.04.002 |
中文关键词: 命名实体识别;实体影响力评估;全文内容分析 |
英文关键词: Named entity recognition; impact of method entity; full-text content analysis |
基金项目:富媒体数字出版内容组织与知识服务重点实验室开放基金项目“富媒体数字出版内容中细粒度知识实体的抽取及关联与
演化分析研究”(ZD2020/09-04)。 |
作者 | 单位 | 李小乐 | 1. 南京理工大学经济管理学院信息管理系 南京 210094; | 王玉琢 | 1. 南京理工大学经济管理学院信息管理系 南京 210094; | 章成志 | 1. 南京理工大学经济管理学院信息管理系 南京 210094;2. 富媒体数字出版内容组织与知识服务重点实验室 北京 100038 |
|
摘要点击次数: 1919 |
全文下载次数: 1446 |
中文摘要: |
[ 目的/ 意义] 在科学的发展中,研究方法扮演着重要角色。收集并分析特定学科的方法实体,能够帮助学者更好地了解该领域的研究方法,并找到适合其自身研究的方法。目前已有针对方法抽取和评价的相关研究,但尚未针对特定任务开展知识实体抽取与评估研究。[ 方法/ 过程] 本文以命名实体识别(Named Entity Recognition,NER)任务为例,从ACL Anthology 网站中收集相关论文,利用内容分析法对论文中作者使用的方法实体进行标注。本文从426 篇学术论文中标注出904 种方法实体。并基于使用次数和使用年代两个维度来评估方法实体影响力。[ 结果/ 结论] 条件随机场是NER 任务中影响力最大的算法,神经网络算法在近五年发展迅猛;学者倾向于使用算法而不是现成的工具进行实体识别;在数据选择方面,经典数据集是学者的首选;F 值、正确率和召回率是影响力最大的评价指标。本文的标注结果能够帮助学者更好地理解该任务,提高科研的效率。实体评估的结果能够为初学者在选择具体研究方法时提供参考。 |
英文摘要: |
[Objective/ Significance] In the development of science, research methods play an important role. Collecting and analyzing method entities of specific disciplines can help scholars better understand the research methods in this field and find methods suitable for their a own research. At present, there have been related researches on method extraction and evaluation,but no knowledge entity extraction and evaluation research has been carried out for specific tasks. [Methods/Process] This article takes the named entity recognition (NER) task as an example, collects relevant papers from the ACL Anthology website, and uses content analysis to annotate the method entities used by the authors in papers. We got 904 method entities from 426 academic papers. We evaluated the impact of the method entity based on the number of times of usage and the age of usage. [Results /Conclusions] The study found that conditional random field is the most influential algorithms in NER task and neural network learning algorithms have developed rapidly in the past 5 years; Scholars tend to use algorithms instead of ready-made tools for entity recognition; In terms of data selection, classic datasets are still the first choice of scholars; F-measure, recall and precision are the most influential indices and measurements. The annotation results in this article can help scholars better understand the task and improve the efficiency of scientific research. The results of entities’ impact can provide a reference for beginners when choosing research methods. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |