The Cunei Machine Translation Platform for WMT '10

نویسنده

  • Aaron Phillips
چکیده

This paper describes the Cunei Machine Translation Platform and how it was used in the WMT ’10 German to English and Czech to English translation tasks. 1 The Cunei Machine Translation Platform The Cunei Machine Translation Platform (Phillips and Brown, 2009) is open-source software and freely available at http://www.cunei. org/. Like Moses (Koehn et al., 2007) and Joshua (Li et al., 2009), Cunei provides a statistical decoder that combines partial translations (either phase pairs or grammar rules) in order to compose a coherent sentence in the target language. What makes Cunei unique is that it models the translation task with a non-parametric model that assesses the relevance of each translation instance. The process begins by encoding in a lattice all possible contiguous phrases from the input.1 For each source phrase in the lattice, Cunei locates instances of it in the corpus and then identifies the aligned target phrase(s). This much is standard to most data-driven MT systems. The typical step at this stage is to model a phrase pair by computing relative frequencies over the collection of translation instances. This model for the phrase pair will never change and knowledge of the translation instances can subsequently be discarded. In contrast to using a phrase pair as the basic unit of modeling, Cunei models each translation instance. A distance function, represented by a log-linear model, scores the relevance of each translation instance. Our model then sums the scores of translation instances that predict the same target hypothesis. The advantage of this approach is that it provides a flexible framework for novel sources of Cunei offers limited support for non-contiguous phrases, similar in concept to grammar rules, but this setting was disabled in our experiments. information. The non-parametric model still uses information gleaned over all translation instances, but it permits us to define a distance function that operates over one translation instance at a time. This enables us to score a wide-variety of information represented by the translation instance with respect to the input and the target hypothesis under consideration. For example, we could compute how similar one translation instance’s parse tree or morpho-syntactic information is to the input. Furthermore, this information will vary throughout the corpus with some translation instances exhibiting higher similarity to the input. Our approach captures that these instances are more relevant and they will have a larger effect on the model. For the WMT ’10 task, we exploited instance-specific context and alignment features which will be discussed in more detail below.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cunei Machine Translation Platform: System Description

In this work we present Cunei, a hybrid, open-source platform for machine translation that models each example of a phrase-pair at run-time and combines them in dynamic collections. This results in a flexible framework that provides consistent modeling and the use of non-local features.

متن کامل

The Universitat d'Alacant hybrid machine translation system for WMT 2011

This paper describes the machine translation (MT) system developed by the Transducens Research Group, from Universitat d’Alacant, Spain, for the WMT 2011 shared translation task. We submitted a hybrid system for the Spanish–English language pair consisting of a phrase-based statistical MT system whose phrase table was enriched with bilingual phrase pairs matching transfer rules and dictionary e...

متن کامل

Training Machine Translation with a Second-Order Taylor Approximation of Weighted Translation Instances

The Cunei Machine Translation Platform is an open-source MT system designed to model instances of translation. One of the challenges to this approach is effective training. We describe two techniques that improve the training procedure and allow us to leverage the strengths of instance-based modeling. First, during training we approximate our model with a second-order Taylor series. Second, we ...

متن کامل

Modeling Relevance in Statistical Machine Translation: Scoring Alignment, Context, and Annotations of Translation Instances

Machine translation has advanced considerably in recent years, primarily due to the availability of larger datasets. However, one cannot rely on the availability of copious, high-quality bilingual training data. In this work, we improve upon the state-of-the-art in machine translation with an instance-based model that scores each instance of translation in the corpus. A translation instance ref...

متن کامل

NYU-MILA Neural Machine Translation Systems for WMT'16

We describe the neural machine translation system of New York University (NYU) and University of Montreal (MILA) for the translation tasks of WMT’16. The main goal of NYU-MILA submission to WMT’16 is to evaluate a new character-level decoding approach in neural machine translation on various language pairs. The proposed neural machine translation system is an attention-based encoder–decoder wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010