| 赖欣,张恒嫣,唐凯,梁译丹.面向实体识别的航空情报领域语料库构建方法[J].情报工程,2025,11(4):015-025 |
| 面向实体识别的航空情报领域语料库构建方法 |
| A Corpus Construction Method for Entity Recognition in the Field of Aeronautical Information |
| |
| DOI: |
| 中文关键词: 语料库构建;航空情报;实体标注体系;命名实体识别 |
| 英文关键词: Corpus Construction; Aeronautical Information; Entity Labeling System; Named Entity Recognition |
| 基金项目:中国民用航空飞行学院研究生科研创新基金项目“航空情报领域语料库构建关键技术研究”(24CAFUC10190);中央高校校级重点项目“大数据环境下航空情报信息特征提取与事件关联研究”(ZJ2023-003);四川省自然科学基金项目“基于异构低空网格的四维航迹规划关键技术”(2023NSFSC0903)。 |
| 作者 | 单位 | | 赖欣 | 中国民用航空飞行学院 成都 641400 | | 张恒嫣 | 中国民用航空飞行学院 成都 641400 | | 唐凯 | 中国民用航空飞行学院 成都 641400 | | 梁译丹 | 中国民用航空飞行学院 成都 641400 |
|
| 摘要点击次数: 13 |
| 全文下载次数: 10 |
| 中文摘要: |
| [目的/意义]航空情报服务是航空运行信息的重要来源,覆盖大量相关领域知识与专业词汇,当前航空情报领域尚无公开语料库,面向实体识别展开语料库构建技术的研究有助于航空情报的数字化转型。[方法/过程]以航空情报汇编资料入手,参照AIXM 对航空实体及属性关系的定义,在领域本体框架构建的基础上建立航空情报领域实体标注体系、制定标注规范,并展开批量实体标注与一致性检验工作,形成面向实体识别任务的航空情报领域语料库。针对实体识别任务,采用BiLSTM-CRF 模型融入领域简缩字词典,对自建语料库质量进行检验。[局限]当前研究多采用机场类航空情报数据,一定程度上限制了实体识别模型的进一步优化和语料库规模的扩展。[结果/结论]融合简缩字词
典后的模型准确率、召回率、F 值分别提高了4.18%、2.70%、3.42%,实验结果表明,该方法能够提升航空情报语料库中的命名实体的识别效率,为后续航空情报信息的实体识别和大规模语料库构建提供支撑。 |
| 英文摘要: |
| [Objective/Significance] Aeronautical Information Service is an important source of aviation operation information,covering a large amount of knowledge and professional vocabulary in related fields. Currently, there is no public corpus in the field of aeronautical information, and research on corpus construction technology oriented to entity recognition is conducive to the digital transformation of aeronautical information. [Methods/Processes] Based on the data of aeronautical information, this paper establishes the entity labeling system and formulates the labeling specification in the field of aeronautical information by referring to the definition of Aeronautical Information Exchange Model and classification of aviation elements. On the basis of text preprocessing, bulk entity annotation and consistency checking were carried out to form the Aeronautical Information Service domain Corpus for entity recognition tasks. Aiming at the entity recognition task, BiLSTM-CRF model was integrated into the domain abbreviated word dictionary to test the quality of the self-built corpus. [Limitations] Current studies mostly use aerodrome aeronautical information data, which to some extent limits the further improvement of entity recognition model and the expansion of corpus scale. [Results/Conclusions] The model accuracy rate, recall rate and F-value after the integration of the abbreviated word dictionary were increased by 4.18%, 2.70% and 3.42% respectively. The experimental results show that this method can improve the recognition effect of named entities in aeronautical information corpus, and provide support for subsequent entity recognition of Aeronautical Information Service and large-scale corpus construction. |
|
查看全文
查看/发表评论 下载PDF阅读器 |
| 关闭 |