谢庆恒.基于多源信息融合的学位论文自动分类标引[J].情报工程,2023,9(3):070-080 |
基于多源信息融合的学位论文自动分类标引 |
Automatic Classification and Indexing of Dissertations Based on Multi-source Information Fusion |
|
DOI:10.3772/j.issn.2095-915X.2023.03.006 |
中文关键词: 学位论文;自动分类;信息融合;BERT |
英文关键词: Dissertation; automatic classification; information fusion; BERT |
基金项目:中国图书馆学会青年项目“智慧图书馆中学位论文自动分类标引研究”(2022LSCKYXM-ZZ-QN003)。 |
|
摘要点击次数: 763 |
全文下载次数: 847 |
中文摘要: |
[ 目的/ 意义] 学位论文是图书馆的特色馆藏文献,实现学位论文的自动分类标引对智慧图书馆建设具有积极意义。[ 方法/ 过程] 首先基于BERT 分别获取题名和摘要的词向量表示,然后将二者进行加权代数和计算得到融合向量,最后将其输入到基于Pytorch 框架构建的Softmax 经典分类器进行学位论文的自动分类标引实践探讨。 [ 局限] 在数据信息源和学科内容的多样性方面尚需进一步加强。[ 结果/ 结论] 模型分类F1 值达到了79.55%,优于基于单一信息的题名或摘要的分类效果,能较好满足实际应用要求。 |
英文摘要: |
[Objective/Significance] Dissertation is the distinguishing collection of the library, and it is of positive significance to realize the automatic classification and indexing of dissertations for the construction of a Smart Library. [Methods/Processes]Firstly, based on BERT, the word vector representations of the title and abstract are obtained, and then the weighted algebraic sum of them is calculated to obtain the fusion vector. Finally, it is inputted into the Softmax classic classifier constructed based on the Python framework for practical exploration of automatic classification and indexing of dissertations. [Limitations] Further exploration is needed in diversity of data information sources and subject content. [Results/Conclusions] The results show that the F1 value of this model reaches 79.55%, which is better than that of title or abstract based on single information, and can fairly meet the requirements of practical application. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |