文章摘要
薛钊,刘千祥,吴昌权,李亢,陈永海.基于BERT模型的科技成果中图分类自动标引方法研究[J].情报工程,2025,11(5):003-012
基于BERT模型的科技成果中图分类自动标引方法研究
Research on Automatic Indexing Method of Chinese Library Classification of Scientific and Technological Achievements Based on BERT Model
  
DOI:
中文关键词: 科技成果;自动标引;深度学习;BERT 模型;解码策略
英文关键词: Scientific and Technological Achievement; Automatic Indexing; Deep Learning; BERT Model; Decoding Method
基金项目:
作者单位
薛钊 中国化工信息中心有限公司 北京 100029 
刘千祥 中国化工信息中心有限公司 北京 100029 
吴昌权 中国化工信息中心有限公司 北京 100029 
李亢 中国化工信息中心有限公司 北京 100029 
陈永海 中国化工信息中心有限公司 北京 100029 
摘要点击次数: 17
全文下载次数: 15
中文摘要:
      [目的/意义]深度学习预训练语言模型(PLMs)在科技文献领域分类中的应用效果远超传统自然语言处理技术。科技成果登记数据与科技文献有显著差异,其简介涵盖项目来源、背景、应用、获奖等多方面内容,从而大大增加了PLMs 对科技成果中图分类预测的难度。[方法/过程] 以BERT 模型(RoBERTa)为基础,在模型集成方法上有所创新,构建科技成果中图分类自动标引系统。通过引入普适于树形分类体系的解码策略,将分类问题转化为解码问题,此举既提升了预测准确率,又突破了传统分类模型只能在单层级预测的局限,实现了动态预测。[局限]受限于所采用的树形分类体系专属解码策略,暂无法适配不具备树形结构的分类体系。[结果/结论]通过定制预测链累积概率、终端概率等筛选条件,该方法可平衡可靠性与分类细致度,满足不同实际业务需求。
英文摘要:
      [Objective/Significance] Application of deep learning pre-trained language models (PLMs) in the classification of scientific and technological literature outperforms traditional Natural Language Processing techniques. There are significant differences between the scientific and technological achievements and scientific and technological literature. The introductions of the former cover various aspects such as project sources, backgrounds, applications, and award-winning information, which significantly increases the difficulty of predicting the chinese library classification using PLMs. [Methods/Processes] This work is based on the BERT model (RoBERTa) and innovates in the model integration method to construct an automatic indexing system for the chinese library classification of scientific and technological achievements. By introducing a decoding strategy, which can be generally applied to the tree-structured classification system, the classification problem is transformed into a decoding problem. This not only improves the prediction accuracy, but also enables dynamic predictions to the required levels. [Limitations] Constrained by the adopted decoding strategy exclusive to tree-structured classification system, this method cannot be directly adapted to classification system without a tree structure. [Results/Conclusions] By customizing predict conditions such as the cumulative probability of the prediction chain and the terminal probability, this method can balance the trade-off between reliability and classification fineness to meet practical needs.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮