Neural Network Based Bilingual Language Model Growing for Statistical Machine Translation
نویسندگان
چکیده
Since larger n-gram Language Model (LM) usually performs better in Statistical Machine Translation (SMT), how to construct efficient large LM is an important topic in SMT. However, most of the existing LM growing methods need an extra monolingual corpus, where additional LM adaption technology is necessary. In this paper, we propose a novel neural network based bilingual LM growing method, only using the bilingual parallel corpus in SMT. The results show that our method can improve both the perplexity score for LM evaluation and BLEU score for SMT, and significantly outperforms the existing LM growing methods without extra corpus.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملThe RWTH Aachen University English-Romanian Machine Translation System for WMT 2016
This paper describes the statistical machine translation system developed at RWTH Aachen University for the English→Romanian translation task of the ACL 2016 First Conference on Machine Translation (WMT 2016). We combined three different state-ofthe-art systems in a system combination: A phrase-based system, a hierarchical phrase-based system and an attentionbased neural machine translation sys...
متن کاملSmooth Bilingual N-Gram Translation
We address the problem of smoothing translation probabilities in a bilingual N-grambased statistical machine translation system. It is proposed to project the bilingual tuples onto a continuous space and to estimate the translation probabilities in this representation. A neural network is used to perform the projection and the probability estimation. Smoothing probabilities is most important fo...
متن کاملLarge-scale Dictionary Construction via Pivot-based Statistical Machine Translation with Significance Pruning and Neural Network Features
We present our ongoing work on large-scale Japanese-Chinese bilingual dictionary construction via pivot-based statistical machine translation. We utilize statistical significance pruning to control noisy translation pairs that are induced by pivoting. We construct a large dictionary which we manually verify to be of a high quality. We then use this dictionary and a parallel corpus to learn bili...
متن کاملThe Edinburgh/JHU Phrase-based Machine Translation Systems for WMT 2015
This paper describes the submission of the University of Edinburgh and the Johns Hopkins University for the shared translation task of the EMNLP 2015 Tenth Workshop on Statistical Machine Translation (WMT 2015). We set up phrase-based statistical machine translation systems for all ten language pairs of this year’s evaluation campaign, which are English paired with Czech, Finnish, French, Germa...
متن کامل