韦向峰,张全,袁毅.出版产业链技术关系抽取研究[J].情报工程,2024,10(6):014-027 |
出版产业链技术关系抽取研究 |
Research on Extracting the Relationship between Technology and Publishing Industry Chain |
|
DOI:10.3772/j.issn.2095-915X.2024.06.002 |
中文关键词: 出版产业;产业链;关系抽取;关系模板;半监督深度学习 |
英文关键词: Publishing Industry; Industry Chain; Relationship Extraction; Relationship Template; Semi-Supervised Deep Learning |
基金项目:2023 年富媒体数字出版内容组织与知识服务重点实验室开放基金“基于预训练模型的产业技术谱系构建研究”(ZD2023-11/03)。 |
作者 | 单位 | 韦向峰 | 1. 中国科学院声学研究所 北京 100190;2. 富媒体数字出版内容组织与知识服务重点实验室 北京 100038 | 张全 | 中国科学院声学研究所 | 袁毅 | 中国科学院声学研究所 |
|
摘要点击次数: 40 |
全文下载次数: 95 |
中文摘要: |
[目的/意义]出版产业链中技术与产业链环节的关系对于出版产业技术谱系的构建和出版产业的监测具有重要意义。[方法/过程]设计了传统出版和数字出版的产业链环节,并从业务环节、产业术语、技术术语、参与主体、产品服务等维度进行了产业技术谱系设计。在获取出版产业技术谱系实体后,利用句法依存分析工具获取实体之间的关系模板,使用MeanTeacher深度学习训练框架和BiGRU+Attention神经网络编码器实现了基于关系模板质量的关系抽取模型;然后使用部分人工标注的半监督深度学习方法对关系模板进行了分类标注和关系分类的模型训练。[局限]未来仍需研究如何提高关系模板中关系类型的识别准确率,通过改进深度学习模型框架来提高模型的性能。[结果/结论]实验表明该关系抽取模型在实际语料库文本中可获得66%的准确率,消融实验表明模板质量等级划分能带来1%的正确率提升。 |
英文摘要: |
[Objective/Significance] The relationship between technology and the nodes of the publishing industry chain is ofsignificant importance for constructing the technological spectrum of the publishing industry and monitoring its development.[Methods/Processes] This article designs the industrial chain and the technological spectrum for both traditional publishing and digital publishing. The design of the industrial technological spectrum includes six dimensions, industrial segments, industrial terms, technical terms, participating entities, and product services. After obtaining the entities of the technological spectrum in publishing industry, relationship templates between entities are acquired using syntactic dependency analysis tools. Then,a relationship extraction model based on the quality of relationship templates is implemented using the Mean Teacher deep learning framework and BiGRU+Attention neural network encoder. Furthermore, a semi-supervised deep learning method with partially manually annotated data is employed for relationship classification model training based on relationship template classification. [Limitations] The future research work is still needed on how to improve the accuracy of identifying relationship types in relationship templates and enhance the performance of models by improving deep learning model frameworks. [Results/Conclusions] Experimental results indicate that this model achieves 66% accuracy in actual corpus texts, and categorizing templates can lead to a 1% increase in accuracy. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |