文章摘要
谢鹏.面向学术文献的学者兴趣标签识别方法[J].情报工程,2019,5(3):065-073
面向学术文献的学者兴趣标签识别方法
Recognition of Scholar Interest Tag for Academic Literatures
  
DOI:10.3772/j.issn.2095-915X.2019.03.006
中文关键词: 用户画像;兴趣标签识别;LSI
英文关键词: User profile; interest label recognition; LSI
基金项目:
作者单位
谢鹏 XIE Peng 
摘要点击次数: 1788
全文下载次数: 1328
中文摘要:
      学术文献是科学进步与发展的载体,各种元数据信息包括作者、论文、期刊以及这些实体之间的关系,具有重要的价值,如何精准构建学者用户画像是一个具有挑战性的问题。早期的用户画像相对简单,区分度以及可用性都不高。本文在“2017 开放学术精准画像大赛”TASK3 的真实数据上,提取学者与期刊的关系和学者与论文的关系,设计关系模型并采用LSI 降维技术以及文本相似度计算,对学者兴趣标签进行识别与评估,并进行数据可视化分析。实验结果表明,使用本文提出的方法可准确有效的识别学者兴趣标签,准确率为P@1=92%、P@2=94%、P@3=98%。
英文摘要:
      literature is recognized as the carrier of scientific progress and development. Various metadata information, including the author, thesis, press, and even the relationship between these entities, is of great value. How to construct user profile for academic users exactly is a challenging issue. The early user profile is relatively simple, with little distinction and usability. Based on the real data set of task 3 in “2017 Open Academic Data Challenge”, we extract the relationship between scholars and press,and the relationship between scholars and thesis to design the relationship model. And then, use the LSI dimensionality reduction technology and the similarity calculation of text to recognize the scholar’s interest. The recognized interests are evaluated and do data visualization analysis. The experimental results show that the method proposed based on the information of the press and thesis in this paper can effectively and accurately recognize scholars interest labels. And the accuracy is P@1=92%, P@2=94%,P@3=98%.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮