曾江峰,庞雨静,高鹏钰,冯昌扬.基于 Lattice LSTM 的中医药古文献命名实体识别与应用研究[J].情报工程,2023,9(5):112-122 |
基于 Lattice LSTM 的中医药古文献命名实体识别与应用研究 |
Research on Named Entity Recognition and Application of Traditional Chinese Medicine Ancient Literature Based on Lattice LSTM |
|
DOI:10.3772/j.issn.2095-915X.2023.05.009 |
中文关键词: Lattice LSTM;中医药古文献;命名实体识别;知识图谱 |
英文关键词: Lattice LSTM; Ancient Chinese Medicine Literatures; Named Entity Recognition; Knowledge Graph |
基金项目:教育部人文社会科学研究青年基金“情境大数据驱动的社交媒体虚假信息识别模型与治理策略研究”(21YJC870002);武汉市知识创新专项项目曙光计划项目“多源知识驱动的社交媒体虚假新闻检测研究”(2022010801020287);富媒体数字出版内容组织与知识服务重点实验室开放基金项目“面向融合出版的前沿技术主题演化及发展趋势预测研究”。 |
作者 | 单位 | 曾江峰 | 1. 华中师范大学信息管理学院 湖北 武汉 430079 | 庞雨静 | 2. 北京理工大学管理与经济学院 北京 100081; | 高鹏钰 | 1. 华中师范大学信息管理学院 湖北 武汉 430079 | 冯昌扬 | 1. 华中师范大学信息管理学院 湖北 武汉 430079;3. 富媒体数字出版内容组织与知识服务重点实验室 北京 100038 |
|
摘要点击次数: 630 |
全文下载次数: 991 |
中文摘要: |
[目的/意义]为进一步提升中医药古文献命名实体识别的准确性,以信息化手段辅助现代中医学者进行医学诊断与临床决策,促进中医学的传承与创新。[方法/过程]提出一种集成字符与词汇信息的中医药古文献命名实体识别的 Lattice LSTM 模型,对《伤寒论》的疾病、证候、方剂、症状和药材五类实体进行抽取;在抽取出的实体基础上,人工提取实体间关系,利用 Neo4j 搭建了中医药知识图谱;最后以新冠肺炎为例,在图谱上完成相关检索。[结果/结论] 实验结果表明,Lattice LSTM 在中医术语识别上性能最优,F1 值达到 95.66%,比主流模型 BiLSTM-CRF 提升了 1.68%,可用于中医药古文献的实体识别;搭建的中医药知识图谱也验证了主模型的现实价值。 |
英文摘要: |
[Objective/Significance] In order to further improve the accuracy of named entity recognition in ancient Chinese medicine literatures, using information tools to assist modern Chinese medicine practitioners in medical diagnosis and clinical decision-making, promote the inheritance and innovation of traditional Chinese medicine. [Methods/Processes] This paper proposes a named entity recognition model of ancient Chinese medicine literatures called Lattice LSTM model that integrates character information and lexical information to extract five entities: disease, syndrome, prescription, symptom and medicine in “Treatise on Febrile Diseases”. Then on the basis of these extracted entities, the relationships between them are manually extracted, and Neo4j is used to build a knowledge graph of traditional Chinese medicine. Finally, taking the COVID-19 as an example, the graph is used to complete the relevant information retrieval. [Results/Conclusions] The experimental results show that Lattice LSTM has the best performance in the recognition of traditional Chinese medicine terms, with an F1 value of 95.66%, which is 1.68% higher than that of the mainstream model BiLSTM-CRF, so Lattice LSTM can be used for named entity recognition of ancient Chinese medicine literatures. In addition, the constructed knowledge graph of traditional Chinese medicine verifies the realistic value of the main model. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |