基于预训练模型的政务领域实体关系抽取

葛世奇; 孙新; 寇桓锦; 袁燕

文章摘要

葛世奇,孙新,寇桓锦,袁燕.基于预训练模型的政务领域实体关系抽取[J].情报工程,2022,8(4):003-013

基于预训练模型的政务领域实体关系抽取

Relation Extraction Based on Pre-trained Model for E-government

DOI：10.3772/j.issn.2095-915X.2022.04.001

中文关键词: 关系抽取；深度学习；BERT 模型；分类池化

英文关键词: Relation Extraction; deep learning; BERT Model; classification pooling

基金项目:中国科学技术信息研究所情报工程实验室开放基金项目“面向政府公文的实体关系抽取研究”。

作者	单位
葛世奇	1. 北京理工大学计算机学院北京 100081；
孙新	1. 北京理工大学计算机学院北京 100081； 2. 北京市海量语言信息处理与云计算应用工程技术研究中心北京 100081
寇桓锦	1. 北京理工大学计算机学院北京 100081；
袁燕	1. 北京理工大学计算机学院北京 100081； 2. 北京市海量语言信息处理与云计算应用工程技术研究中心北京 100081

摘要点击次数: 2431

全文下载次数: 2508

中文摘要:

[ 目的 / 意义 ] 海量电子政务信息资源极大地方便了人们对信息的获取，同时也给人们有效获取信息和知识提出了挑战。关系抽取是信息抽取的核心任务，政务领域的关系抽取对后续面向政务领域的智能检索、智能问答及政府公文生成等智慧化服务与建设具有深远影响。但是政务领域的文本中存在着大量的长实体，使得传统的关系抽取模型在政务领域数据集中的表现不尽如人意。[ 方法 / 过程 ] 本文提出一种适用于政务领域的中文关系抽取模型 CPRE-BERT，首先使用预训练 BERT 模型作为编码器，能够在一定程度上解决多义性、边界模糊以及领域关系数据数量不足的问题。其次，采用分类池化的思想分别处理短实体和长实体，最大程度上保留实体信息。最后，在中文领域人物关系公共数据集和政务领域数据集上的对比实验结果验证了模型的有效性。[ 结果 / 结论 ] 实验结果表明，本文提出的方法在政务领域数据集上关系抽取的准确率和 F1 指标较基线模型分别提高了 2.3% 和 2.2%。

英文摘要:

[Objective/Significance] Massive e-government information resources not only bring great convenience, but also pose a challenge to people’s effective access to information and knowledge. Relation extraction is the core task of information extraction, and plays an important role in the intelligent services such as intelligent retrieval, intelligent question answering etc. Due to the large number of long entities in the field of e-government, the performance of the classical relationship extraction model in the data set of the government field is not satisfactory. [Methods/Processes] To address this issue, we propose a novel relation extraction model, which employs BERT model to solve the problems of ambiguity, fuzzy boundary, and insufficient annotated data. Secondly, the classification pool method is used to process short entities and long entities respectively which can obtain better entity information. Finally, the experimental results on public datasets and government domain datasets demonstrate the effectiveness of the model. [Results/Conclusions] Compared with the baseline model, the accuracy and F1-score of this model are improved by 2.3% and 2.2% respectively.

查看全文查看/发表评论下载PDF阅读器

关闭