谷威,田欣.基于条件随机场和篇章校对的有机物命名实体识别方法研究[J].情报工程,2018,4(5):064-072 |
基于条件随机场和篇章校对的有机物命名实体识别方法研究 |
Research on Organic Matter Named Entity Identification Method Based on Conditional Random Field and Text Proofreading |
|
DOI:10.3772/j.issn.2095-915X.2018.05.006 |
中文关键词: 有机物识别;命名实体;条件随机场;篇章校对 |
英文关键词: Organic matter identification; named entity; conditional random field; text proofreading |
基金项目: |
作者 | 单位 | 谷威 | 国家知识产权局专利局 | 田欣 | 中国专利信息中心 |
|
摘要点击次数: 2487 |
全文下载次数: 1462 |
中文摘要: |
有机物命名实体识别是生物医学等专利文本挖掘和机器翻译的关键步骤,只有正确地识别出有机物命名实体,才能准确、有效地完成专利挖掘和翻译。本文从有机物命名实体识别的自身构成特点出发,重点研究了有机物命名实体识别的流程、方法和特征,采用CRF 算法和篇章校对结合的方法实现了有机物命名实体的自动识别,达到了较高的准确率和召回率。下一步的研究中将利用模板和CRF 等多策略识别方法继续改进。 |
英文摘要: |
Organic matter named entity identification is the critical step of patent text mining (especially bio-medicine text) and machine translation, and only when the organic matter named entity is correctly identified, thepatent mining and machine translation can be effectively completed. From the aspect of selfstructure features of organic matter named entity identification, this paper focuses on the research of the procedures, methods and features of organic matter named entity identification, and adopts the combined method of CRF and text proofreading to realize the automatic id entification of organic matter named entities and achieve a rather high precision rate and recalling rate. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |