基于知识图谱的知识产权竞争情报文本语义过滤模型

孙利娟; 韩宜书; 吴旭

文章摘要

孙利娟,韩宜书,吴旭.基于知识图谱的知识产权竞争情报文本语义过滤模型[J].情报工程,2026,(2):003-014

基于知识图谱的知识产权竞争情报文本语义过滤模型

Knowledge Graph-Based Text Semantic Filtering Model for Intellectual Property Competitive Intelligence

DOI：

中文关键词: 知识产权竞争情报；知识图谱；语义过滤；实体链接；BERT

英文关键词: Intellectual Property Competitive Intelligence; Knowledge Graph; Semantic Filtering; Entity Linking; BERT

基金项目:信息产业部标准研究项目“网络电子身份标识elD 移动应用接口测试方法”（2014-1020T-YD）。

作者	单位
孙利娟	国家图书馆信息技术部北京 100081
韩宜书	北京邮电大学可信分布式计算与服务教育部重点实验室北京 100876
吴旭	北京邮电大学网络空间安全学院北京 100876

摘要点击次数: 30

全文下载次数: 35

中文摘要:

[目的/意义]为满足知识产权竞争情报实际工作中以某主题为核心过滤所有语义相关文本的任务需要，解决长短文本间语义过滤不精准的问题。[方法/过程]本文构建了一个基于知识图谱的知识产权竞争情报文本语义过滤模型。通过知识产权竞争情报领域知识图谱对查询主题和候选文档分别进行实体链接，并通过知识图谱的嵌入表示方法将其分别表示为实体向量集合，最后通过晚交互的神经网络计算两者的相似度，依据相似度对候选文本排序完成过滤任务。[结果/结论]本文在自主构建的知识产权领域竞争情报知识图谱和语义过滤任务数据集上进行了系统地实验，结果表明本模型的平均倒数排名可达到9.41，且排名前十的文档平均召回率高达84.25%，上述各项指标均优于对比算法。本文模型能够克服实际任务中长短文本的差异，充分利用知识图谱的语义信息，在知识产权领域竞争情报文本语义过滤任务中具备可行性和准确性，能够满足现实工作的需求。

英文摘要:

[Objective/Significance] To address the practical need for filtering semantically related texts around a specific topic in intellectual property competitive intelligence tasks, and to solve the issue of inaccurate semantic filtering between long and short texts. [Methods/Processes] This paper constructs a knowledge graph-based semantic filtering model for intellectual property competitive intelligence texts. The model performs entity linking on both the query topic and candidate documents using a domain-specific knowledge graph for intellectual property competitive intelligence, representing them as sets of entity vectors through knowledge graph embedding methods. A late-interaction neural network then calculates their similarity, and candidate texts are ranked and filtered based on similarity score. [Results/Conclusions] Systematic experiments were conducted on a selfconstructed intellectual property competitive intelligence knowledge graph and a semantic filtering dataset. Experimental results show that the Mean Reciprocal Rank (MRR) of this model is 9.41, and the average recall rate of the top ten documents is as high as 84.25%. Both indicators surpass those of the contrasting algorithms. The model effectively overcomes the discrepancy between long and short texts in practical tasks by leveraging semantic information from knowledge graphs, demonstrating feasibility and accuracy in intellectual property competitive intelligence text filtering.

查看全文查看/发表评论下载PDF阅读器

关闭