基于枢轴语言的平行语料构建方法

单华; 张玉洁; 周雯; 徐金安; 陈钰枫

文章摘要

单华,张玉洁,周雯,徐金安,陈钰枫.基于枢轴语言的平行语料构建方法[J].情报工程,2017,3(3):029-039

基于枢轴语言的平行语料构建方法

Approach of Constructing Parallel Corpus Based on Pivot Language

DOI：10.3772/j.issn.2095-915X.2017.03.005

中文关键词: 枢轴语言，机器翻译，平行语料，主动学习

英文关键词: Pivot language, machine translation, parallel corpus, active learning

基金项目:本文受国家自然科学基金(61370130, 61473294)的资助。

作者	单位
单华	北京交通大学计算机与信息技术学院
张玉洁	北京交通大学计算机与信息技术学院
周雯	北京交通大学计算机与信息技术学院
徐金安	北京交通大学计算机与信息技术学院
陈钰枫	北京交通大学计算机与信息技术学院

摘要点击次数: 2318

全文下载次数: 1328

中文摘要:

平行语料库的规模对于统计机器翻译性能的提高具有重要作用，但是平行语料库的人工构建成本很高。针对这个问题，本文提出了一种低成本高效率的平行语料构建方法，利用枢轴语言作为桥梁，借助已有的机器翻译技术并融合主动学习方法构建目标语言对的大规模高质量平行语料库。本文通过以英语作为枢轴语言构建日汉平行语料库的实例研究，利用成熟的基于短语的统计机器翻译技术，描述了基于译文自动评测的良好译文选择方法、基于主动学习的语料选取方法、以及翻译系统的更新迭代和评价实验。实验结果表明，本文提出的方法能够快速构建日汉平行语料，并有效提高日汉翻译系统的性能。

英文摘要:

A large scale parallel corpus plays an important role in improving the performance of machine translation. It spent highly for manually constructing a parallel corpus. This paper proposed a pivot based approach for constructing high quality parallel corpus with low cost, in which the existing machine translation technology and active learning method are combined. This paper describes the domain adaptation method based on active learning, the good translation selection method based on automatic translation evaluation, and iterative retraining of translation system. We applied the approach to the construction of Japanese-Chinese parallel corpus by taking English as pivot and conducted evaluation experiments. The experimental results showed that the proposed approach effectively obtained Japanese-Chinese parallel corpus with high quality and the constructed parallel corpus indeed improved the performance of Japanese-Chinese machine translation system.

查看全文查看/发表评论下载PDF阅读器

关闭