Tuning Statistical Machine Translation Parameters

نویسندگان

  • Ahmed Ragab Nabhan
  • Ahmed A. Rafea
چکیده

Word alignment is the basis of statistical machine translation. GIZA++ is a popular tool for producing word alignments and translation models. It uses a set of parameters that affect the quality of word alignments and translation models. These parameters exist to overcome some problems such as overfitting. This paper addresses the problem of tuning GIZA++ parameter for better translation quality. The results show that our systematic procedure for parameter tuning can improve the translation quality. Keywords— Parameter tuning, smoothing factors, training scheme, overfitting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TÜBİTAK-BİLGEM German-English Machine Translation Systems for W13

This paper describes TÜBİTAK-BİLGEM statistical machine translation (SMT) systems submitted to the Eighth Workshop on Statistical Machine Translation (WMT) shared translation task for German-English language pair in both directions. We implement phrase-based SMT systems with standard parameters. We present the results of using a big tuning data and the effect of averaging tuning weights of diff...

متن کامل

TÜBİTAK - BİLGEM German - English Machine Translation Systems for WMT ’ 13

This paper describes TÜBİTAK-BİLGEM statistical machine translation (SMT) systems submitted to the Eighth Workshop on Statistical Machine Translation (WMT) shared translation task for German-English language pair in both directions. We implement phrase-based SMT systems with standard parameters. We present the results of using a big tuning data and the effect of averaging tuning weights of diff...

متن کامل

Tuning machine translation parameters with SPSA

Most of statistical machine translation systems are combinations of various models, and tuning of the scaling factors is an important step. However, this optimisation problem is hard because the objective function has many local minima and the available algorithms cannot achieve a global optimum. Consequently, optimisations starting from different initial settings can converge to fairly differe...

متن کامل

Edinburgh's Syntax-Based Machine Translation Systems

We present the syntax-based string-totree statistical machine translation systems built for the WMT 2013 shared translation task. Systems were developed for four language pairs. We report on adapting parameters, targeted reduction of the tuning set, and post-evaluation experiments on rule binarization and preventing dropping of verbs.

متن کامل

Experiences with English-Hindi, English-Tamil and English-Kannada Transliteration Tasks at NEWS 2009

We use a Phrase-Based Statistical Machine Translation approach to Transliteration where the words are replaced by characters and sentences by words. We employ the standard SMT tools like GIZA++ for learning alignments and Moses for learning the phrase tables and decoding. Besides tuning the standard SMT parameters, we focus on tuning the Character Sequence Model (CSM) related parameters like or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004