文章摘要
丁连红,孙斌,张宏伟.基于知识图谱扩展的短文本分类方法[J].情报工程,2018,4(5):038-046
基于知识图谱扩展的短文本分类方法
Short Text Classification Based on Knowledge Graph Extension
  
DOI:10.3772/j.issn.2095-915X.2018.05.004
中文关键词: 短文本分类;语义扩展;知识图谱;知识推理
英文关键词: Short text classification; semantic extension; knowledge graph; knowledge inference
基金项目:北京市社会科学基金项目青年项目“社交电商中消费行为演化机制及引导措施研究”(17GLC066),北京物资学院高级别培养项目(GJB20162002)。
作者单位
丁连红 北京物资学院信息学院 
孙斌 北京物资学院信息学院 
张宏伟 北京物资学院信息学院 
摘要点击次数: 3089
全文下载次数: 2545
中文摘要:
      概念图谱是微软根据对用户搜索日志的统计分析构建的一个大型知识图谱。为了解决文本分类中短文本的数据稀疏、易受噪声影响和主题不明确等问题,本文提出了一种基于概念图谱的短文本语义扩展表示方法。首先,计算文本特征词与概念图谱中各概念的关联度,选取关联度高的概念构成当前文本的概念词典。然后,将概念词典加入特征词集合得到短文本的语义扩展表示。对来自Twitter的短文本进行了扩展前与扩展后的分类实验,实验涉及5 种分类算法和6 种关联度计算方法。结果显示,概念化语义扩展表示可以提高短文本的分类效果,且包含可以扩展的特征越多的文本,分类结果提升越显著。
英文摘要:
      The Concept Graph is a large-scale knowledge graph constructed by Microsoft based on statistical analysis of user search logs. In order to solve the problem of sparse data, vulnerability to noise, and unclear topic in short text classification, this paper proposes a short text semantic extension representation method based on the Concept Graph. Firstly, the relevance degree between the feature words and the concepts in the Concept Graph is calculated. Top k concepts with the highest relevance are selected as the concept dictionary of the current text. Then, the concept dictionary is combined with the feature words to obtain the semantic representation of the short text. Dataset from Twitter is adopted to evaluate our method. 5 classification algorithms and 6 correlation calculation methods are involved in the experiments. The experiment results show that the semantic representation through conceptualized extension can enhance the classification of short text. We also find the more the feature words that can be expanded in the short text, the better the classification resul t is.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮