文章摘要
丁龙,文雯,林强.基于预训练BERT 字嵌入模型的领域实体识别[J].情报工程,2019,5(6):065-074
基于预训练BERT 字嵌入模型的领域实体识别
Domain Entity Recognition Based on Pre-trained BERT Character Embedding
  
DOI:10.3772/j.issn.2095-915X.2019.06.006
中文关键词: 医疗电子病历;命名实体识别;EMR-BERT;字嵌入;Bi-LSTM;CRF
英文关键词: Medical electronic record; named entity recognition; EMR-BERT; character embedding; Bi-LSTM; CRF
基金项目:湖南省教育厅优秀青年项目(18B279);湖南省哲学社会科学课题(16YBA323);湖南省“ 研究生科研创新” 项目 (CX20190737);南华大学研究生教改项目(2016JG029)。
作者单位
丁龙 南华大学计算机学院 
文雯 南华大学计算机学院 
林强 南华大学计算机学院 
摘要点击次数: 2119
全文下载次数: 1735
中文摘要:
      随着医疗信息化的发展,越来越多的医疗信息被数字化的记录下来,这些医疗信息蕴含着丰富的医学知识。如何有效地提高提取和利用海量医疗文本信息成为当下医疗信息化发展的巨大挑战,针对目前医疗文本标注数据的不足以及医疗实体边界模糊的问题,本文提出一种基于大量医疗文献预训练的字嵌入语言表示模型。该模型利用大量的医疗文献对BERT 模型进行预训练,从而得到EMRBERT模型,再通过EMR-BERT 对训练文本进行字嵌入向量表示,将结果输到Bi-LSTM 模型,最后利用CRF 模型进行输出得到最终的结果。通过多组对比实验证明,EMR-BERT+BiLSTM+CRF 模型最终结果优于目前主流模型。因此,该模型能够有效解决医疗电子病历领域命名实体识别任务下,标注数据不足以及实体边界模糊的问题。
英文摘要:
      With the development of medical informationization, more and more medical information is digitally recorded. These medical information contains a wealth of medical knowledge. How to effectively improve the effective extraction and utilization of massive medical text information has become a huge challenge for the development of medical informationization. In order to solve the problem of insufficient data labeling and blurring of medical entity boundaries, this paper proposes a word embedding language representation model based on a large number of medical literature pre-training, which uses a large number of medical literature to pre-train the BERT model to obtain the EMR-BERT model, and then embeds the text into the training text through EMR-BERT. It means that the result is input to the Bi-LSTM model, and finally the output is obtained by using the CRF model.Through multiple sets of comparison experiments, the results of EMR-BERT+BiLSTM+CRF model is better than the current mainstream model. Therefore, the model can effectively solve the problem of insufficient annotation data and fuzzy boundary of the entity in the medical electronic medical record field.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮