大语言模型下的学术创新评估测试—— 以工程技术领域为例的评审框架与效能验证

蒋建斌

文章摘要

蒋建斌.大语言模型下的学术创新评估测试—— 以工程技术领域为例的评审框架与效能验证[J].情报工程,2026,(1):040-050

大语言模型下的学术创新评估测试—— 以工程技术领域为例的评审框架与效能验证

Leveraging Large Language Models for Academic Innovation Evaluation: An Assessment Framework and Efficacy Validation in Engineering and Technology

DOI：

中文关键词: 大语言模型；学术创新；评估测试；工程技术；人机协同

英文关键词: Large Language Models; Academic Innovation; Evaluation Test; Engineering and Technology; Human-AI Collaboration

基金项目:

作者	单位
蒋建斌	中国建设工程造价管理协会　北京　100037

摘要点击次数: 2

全文下载次数: 3

中文摘要:

[目的/ 意义] 针对传统学术评价体系中存在的路径依赖与量化偏差问题，以工程技术领域学术论文创新评价为研究对象，探究大语言模型在学术创新评估方面的应用价值。[ 方法/ 过程] 通过构建涵盖理论突破、方法革新与应用转化三个维度的人工智能评价指标体系，设计并实施了覆盖五个工程技术学科的对比实验，采用大语言模型供应商之一DeepSeek 智能评审对话方式进行测试分析。[ 局限] 在实验样本的采集方面需要进一步扩大学科范围和样本数量，为深化研究提供更多学科领域和数据分析。[ 结果/ 结论] 大语言模型在语言逻辑、体例标准等结构化指标评估中具备显著优势，但在核心创新维度方面呈现明显局限性，为此，提出人机协同评审机制，通过机器预筛与专家复核的递进式协同，发挥大语言模型的高效处理与评审专家的深层认知优势，有效平衡效率与质量。测试模型可有效提升学术评价效能，降低误判率，可为优化学术创新评估体系提供实践参考。

英文摘要:

[Objective/Significance] Addressing the issues of path dependence and quantitative bias inherent in traditional academic evaluation systems, this study uses academic papers in the engineering and technology domain as a research case to assess the application value of Large Language Models in evaluating academic innovation. [Methods/Processes] By constructing an AI-powered evaluation framework encompassing three dimensions—theoretical breakthroughs, methodological innovations,and practical applications—this study designed and conducted a comparative experiment spanning five engineering disciplines,employing DeepSeek’s intelligent review dialogue (a leading LLM provider) for empirical analysis. [Limitations] Future studies could expand the disciplinary scope and sample size of experimental data to enhance the validity and robustness of findings across broader academic contexts. [Results/Conclusions] The experimental results reveal that while Large Language Models demonstrate significant advantages in evaluating structured metrics such as linguistic logic and formatting standards, they exhibit notable limitations in assessing core innovation dimensions. To address this, we propose a human-AI collaborative review mechanism. This progressive synergy—combining machine pre-screening with expert validation—effectively balances efficiency and quality by leveraging LLMs’ high-speed processing capabilities and human reviewers’ depth of cognitive insight.The proposed model demonstrably enhances academic evaluation efficacy and reduces misjudgment rates, providing actionable insights for optimizing academic innovation assessment systems.

查看全文查看/发表评论下载PDF阅读器

关闭