| 王朝阳,李超,郭红梅.RRO-ChatData 的中文期刊版面分析方法[J].情报工程,2025,11(4):115-127 |
| RRO-ChatData 的中文期刊版面分析方法 |
| RRO ChatData Chinese Journal Layout Analysis Method |
| |
| DOI: |
| 中文关键词: 大语言模型;阅读顺序还原;学术论文;关键信息识别;ChatData |
| 英文关键词: Large Language Model; Restoration of Reading Order; Academic Papers; Key Information Identification; ChatData |
| 基金项目:中国科学技术信息研究所创新研究基金青年项目“AI 驱动下专业图书馆资源服务一体化创新模式研究”(QN2025-10);国家重点研发计划项目“科技文献内容深度挖掘及智能分析关键技术和软件”(2022YFF0711900)。 |
| 作者 | 单位 | | 王朝阳 | 中国科学技术信息研究所 北京 100038 | | 李超 | 中国科学技术信息研究所 北京 100038 | | 郭红梅 | 中国科学技术信息研究所 北京 100038 |
|
| 摘要点击次数: 13 |
| 全文下载次数: 11 |
| 中文摘要: |
| [目的/意义] 大语言模型技术在图书馆智慧服务的创新应用,不仅对期刊语料提出了更高的要求,其技术本身也具备参与学术论文数字化加工的能力。以中文期刊为例对学术论文全文语料优化与文献关键信息提取方法进行了探究。[方法/过程] 提出一种RRO-ChatData 方法,通过阅读顺序还原方法对不同版面结构的中文期刊论文PDF 文件按照人类阅读顺序进行复原,并针对中文期刊资源特点,基于大语言模型技术构建ChatData 特征识别方法进行关键信息识别。[局限] 由于资助信息内容相对复杂且表述方式不固定,因此该方法对于资助信息的识别准确率较低。[结果/结论]RRO-ChatData 方法能够准确对学术论文进行阅读顺序还原与内容提取,有效提升中文期刊全文语料的质量与数字化加工过程中关键信息提取的效率。 |
| 英文摘要: |
| [Objective/Significance] The innovative application of large language modeling technology in intelligent library services not only puts forward higher requirements for journal corpus, but also has the ability to participate in the digital processing of academic papers. This article takes Chinese journals as an example to explore the optimization of full-text corpus and the extraction of key information from academic papers. [Methods/Processes] This article proposes an RRO-ChatData method, which uses the Restoration of Reading Order (RRO) method to restore PDF files of Chinese journal articles with different layout structures according to human reading order. Based on the characteristics of Chinese journal resources, a ChatData feature recognition method is constructed using large language model technology for key information recognition. [Limitations] Due to the relatively complex content and unstable expression of funding information, the accuracy of this method to identify funding information is relatively low. [Results/Conclusions] The RRO ChatData method can accurately restore the reading order and extract content of academic papers, effectively improving the quality of Chinese journal full-text corpus and the efficiency of key information extraction in digital processing. |
|
查看全文
查看/发表评论 下载PDF阅读器 |
| 关闭 |