Tuning Statistical Machine Translation Parameters
نویسندگان
چکیده
Word alignment is the basis of statistical machine translation. GIZA++ is a popular tool for producing word alignments and translation models. It uses a set of parameters that affect the quality of word alignments and translation models. These parameters exist to overcome some problems such as overfitting. This paper addresses the problem of tuning GIZA++ parameter for better translation quality. The results show that our systematic procedure for parameter tuning can improve the translation quality. Keywords— Parameter tuning, smoothing factors, training scheme, overfitting.
منابع مشابه
TÜBİTAK-BİLGEM German-English Machine Translation Systems for W13
This paper describes TÜBİTAK-BİLGEM statistical machine translation (SMT) systems submitted to the Eighth Workshop on Statistical Machine Translation (WMT) shared translation task for German-English language pair in both directions. We implement phrase-based SMT systems with standard parameters. We present the results of using a big tuning data and the effect of averaging tuning weights of diff...
متن کاملTÜBİTAK - BİLGEM German - English Machine Translation Systems for WMT ’ 13
This paper describes TÜBİTAK-BİLGEM statistical machine translation (SMT) systems submitted to the Eighth Workshop on Statistical Machine Translation (WMT) shared translation task for German-English language pair in both directions. We implement phrase-based SMT systems with standard parameters. We present the results of using a big tuning data and the effect of averaging tuning weights of diff...
متن کاملTuning machine translation parameters with SPSA
Most of statistical machine translation systems are combinations of various models, and tuning of the scaling factors is an important step. However, this optimisation problem is hard because the objective function has many local minima and the available algorithms cannot achieve a global optimum. Consequently, optimisations starting from different initial settings can converge to fairly differe...
متن کاملEdinburgh's Syntax-Based Machine Translation Systems
We present the syntax-based string-totree statistical machine translation systems built for the WMT 2013 shared translation task. Systems were developed for four language pairs. We report on adapting parameters, targeted reduction of the tuning set, and post-evaluation experiments on rule binarization and preventing dropping of verbs.
متن کاملExperiences with English-Hindi, English-Tamil and English-Kannada Transliteration Tasks at NEWS 2009
We use a Phrase-Based Statistical Machine Translation approach to Transliteration where the words are replaced by characters and sentences by words. We employ the standard SMT tools like GIZA++ for learning alignments and Moses for learning the phrase tables and decoding. Besides tuning the standard SMT parameters, we focus on tuning the Character Sequence Model (CSM) related parameters like or...
متن کامل