文章摘要
翟文洁,闫琰,张博文,殷绪成.基于混合深度信念网络的多类文本 表示与分类方法[J].情报工程,2016,2(5):030-040
基于混合深度信念网络的多类文本 表示与分类方法
A Model for Text Representation and Classification Based on Hybrid Deep Belief Networks
  
DOI:
中文关键词: 文本分类,文本表示,深度学习,深度信念网络
英文关键词: Text classification, text representation, deep learning, deep belief networks
基金项目:本文受国家自然科学基金项目:结合前馈和反馈机制的自然场景文本识别技术(61473036)资助。
作者单位
翟文洁 北京科技大学计算机科学与技术系 
闫琰 中国矿业大学计算机与科学技术系 
张博文 北京科技大学计算机科学与技术系 
殷绪成 北京科技大学计算机科学与技术系 
摘要点击次数: 2701
全文下载次数: 2205
中文摘要:
      本文开展了基于混合深度信念网络的多类文本表示与分类方法的研究,以解决传统的 Bag-ofWords(BOW)表示方法忽略文本语义信息、特征提取存在高维度高稀疏的问题。文章基于文本关键字, 针对多类的分类任务(如新闻文本和生物医学文本),以关键字的词向量表示作为文本输入,同时结 合深度信念网络(Deep Belief Network,DBN)和深度玻尔兹曼机网络(Deep Boltzmann Machine, DBM),设计了一种混合深度信念网络(Hybrid Deep Belief Network,HDBN)模型。文本分类和文 本检索的实验结果表明,基于词向量嵌入的深度学习模型在性能上优于传统方法。此外,通过二维空 间可视化实验,由 HDBN 模型提取的高层文本表示具有高内聚低耦合的特点
英文摘要:
      This paper developed a model for text representation and classification based on hybrid deep belief networks, in order to solve the problem of traditional text representation methcod (Bag-of-Words), which ignores the semantic relations and whose feature extraction is high-dimensional and high-sparse. Based on the text keywords, we explored the word vector of keywords as the input for multiple classification tasks, such as news and biomedicine texts, and we also proposed a new model —HDBN (Hybrid Deep Belief Network) which is based on the integration of DBN (Deep Belief Network) and DBM (Deep Boltzmann Machine). The results of text categorization and text retrieval showed that the HDBN model can performed better than the traditional methods. Moreover, the results of two-dimensional spatial visualization also indicated that high-level text representation based on the HDBN model presented the character of high cohesion and low coupling.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮