Parameter Optimization for Statistical Machine Translation: It Pays to Learn from Hard Examples

نویسندگان

  • Preslav Nakov
  • Fahad Al-Obaidli
  • Francisco Guzmán
  • Stephan Vogel
چکیده

Research on statistical machine translation has focused on particular translation directions, typically with English as the target language, e.g., from Arabic to English. When we reverse the translation direction, the multiple reference translations turn into multiple possible inputs, which offers both challenges and opportunities. We propose and evaluate several strategies for making use of these multiple inputs: (a) select one of the datasets, (b) select the best input for each sentence, and (c) synthesize an input for each sentence by fusing the available inputs. Surprisingly, we find out that it is best to tune on the hardest available input, not on the one that yields the highest BLEU score. This finding has implications on how to pick good translators and how to select useful data for parameter optimization in SMT.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending Anchored Learning to Machine Translation Evaluation

We compare two methods, Anchored Learning, and a simple new method (Hyperplane Distance), for finding Hard To Learn examples in Machine Learning tasks that use SVMs. These Hard To Learn examples can be corpus errors, or examples which are difficult to predict with the given set of features. The Anchored Learning method is extended to deal with various kernels and SV regression. A number of expe...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Towards a Systematic and Human-Informed Paradigm for High-Quality Machine Translation

Since the advent of modern statistical machine translation (SMT), much progress in system performance has been achieved that went hand-in-hand with ever more sophisticated mathematical models and methods. Numerous small improvements have been reported whose lasting effects are hard to judge, especially when they are combined with other newly proposed modifications of the basic models. Often the...

متن کامل

SIZE AND GEOMETRY OPTIMIZATION OF TRUSSES USING TEACHING-LEARNING-BASED OPTIMIZATION

A novel optimization algorithm named teaching-learning-based optimization (TLBO) algorithm and its implementation procedure were presented in this paper. TLBO is a meta-heuristic method, which simulates the phenomenon in classes. TLBO has two phases: teacher phase and learner phase. Students learn from teachers in teacher phases and obtain knowledge by mutual learning in learner phase. The suit...

متن کامل

Speed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization

We address the problem of automatically finding the parameters of a statistical machine translation system that maximize BLEU scores while ensuring that decoding speed exceeds a minimum value. We propose the use of Bayesian Optimization to efficiently tune the speed-related decoding parameters by easily incorporating speed as a noisy constraint function. The obtained parameter values are guaran...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013