文章摘要
汪凯,梁宇腾,张玉洁,徐金安,陈钰枫.基于图的汉语字级别依存分析联合模型[J].情报工程,2022,8(3):068-080
基于图的汉语字级别依存分析联合模型
A joint Model for Graph-based Chinese Character Level Dependency Parsing
  
DOI:10.3772/j.issn.2095-915X.2022.03.005
中文关键词: 依存分析;联合模型;词性标注;汉语分词
英文关键词: Dependency parsing; joint model; POS tagging; Chinese word segmentation
基金项目:国家自然科学基金 (61876198, 61976016)。
作者单位
汪凯 北京交通大学计算机与信息技术学院 北京 100044 
梁宇腾 北京交通大学计算机与信息技术学院 北京 100044 
张玉洁 北京交通大学计算机与信息技术学院 北京 100044 
徐金安 北京交通大学计算机与信息技术学院 北京 100044 
陈钰枫 北京交通大学计算机与信息技术学院 北京 100044 
摘要点击次数: 1060
全文下载次数: 1064
中文摘要:
      [ 目的/ 意义] 汉语分词、词性标注和依存句法分析作为汉语自然语言处理的三大基本任务发挥着至关重要的作用。基于转移的三个任务联合模型曾经取得最好精度,但是随着神经网络和计算能力的发展,具有全局信息建模能力的图模型,在单任务和两个任务上已经超过转移模型。如何在基于图模型下联合三个任务,进一步提升精度成为新的挑战。[ 方法/ 过程] 本文提出一种基于图的汉语分词、词性标注和依存句法分析的联合模型,通过设计统一的字级别标签实现三个任务的联合,并采用预训练语言模型融合上下文信息的字表示方法和基于双仿射注意力机制的评分函数。本文也设计了联合模型的解法算法用于三个任务的解码。[ 结果/ 结论] 实验结果表明,本文词性标注任务的引入方式可以建模词性与分词以及词性与依存句法分析之间的关系,从而带来其他两个任务上精度的提升。与目前精度最好的Yan[1] 工作相比,在三个任务上都取得最好精度。
英文摘要:
      [Objective/Significance] Chinese word segmentation, POS tagging and dependency parsing play a vital role. The three-task transition-based joint model has achieved the best accuracy, but with the development of neural networks and computing capabilities, the graph-based model with global information modeling capabilities has surpassed the transition-based model in single-task and two-tasks. How to combine the three tasks based on the graph-based framework to further improve the accuracy has become a new challenge. [Methods/Process] This paper proposes a joint model of three tasks based on graph-based framework. The combination is realized by designing unified character-level tags, and the character context representation method based on pre-training language model (e.g. BERT). The scoring function implemented by the biaffine attention mechanism.This paper also designs the solution algorithm of the joint model for the decoding of the three tasks. [Results /Conclusions] The experimental results show that, the introduction of the POS tagging can better model relationship between part-of-speech and word segmentation, as well as between part-of-speech and dependency parsing, so as to improve the accuracy of the other two tasks. Compared with the Yan work[1], the best performance is achieved on the three tasks.
查看全文   查看/发表评论  下载PDF阅读器
关闭

分享按钮