An MT Error-Driven Discriminative Word Lexicon using Sentence Structure Features

نویسندگان

Jan Niehues

Alexander H. Waibel

چکیده

The Discriminative Word Lexicon (DWL) is a maximum-entropy model that predicts the target word probability given the source sentence words. We present two ways to extend a DWL to improve its ability to model the word translation probability in a phrase-based machine translation (PBMT) system. While DWLs are able to model the global source information, they ignore the structure of the source and target sentence. We propose to include this structure by modeling the source sentence as a bag-of-n-grams and features depending on the surrounding target words. Furthermore, as the standard DWL does not get any feedback from the MT system, we change the DWL training process to explicitly focus on addressing MT errors. By using these methods we are able to improve the translation performance by up to 0.8 BLEU points compared to a system that uses a standard DWL.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximum mutual information multi-phone units in direct modeling

This paper introduces a class of discriminative features for use in maximum entropy speech recognition models. The features we propose are acoustic detectors for discriminatively determined multi-phone units. The multi-phone units are found by computing the mutual information between the phonetic subsequences that occur in the training lexicon, and the word labels. This quantity is a function o...

متن کامل

Discriminant Models for Word Alignment

Word alignment aims to link each word of a translated sentence to its related words in the source sentence. Nowadays, Giza++ is the most used word alignment system. This toolkit implements the generative IBM models. Despite its popularity, several limitations remain. We thus propose to address this task using discriminative models (Maximum Entropy and Conditional Random Fields) which can easily...

متن کامل

Feature extraction in opinion mining through Persian reviews

Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...

متن کامل

Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets

An unsupervised discriminative training procedure is proposed for estimating a language model (LM) for machine translation (MT). An English-to-English synchronous context-free grammar is derived from a baseline MT system to capture translation alternatives: pairs of words, phrases or other sentence fragments that potentially compete to be the translation of the same source-language fragment. Us...

متن کامل

Lexicon-Driven Machine Translation

Machine Translation (MT) systems have historically relied upon explicit grammars in order to analyze the source text and reproduce it in the target language. In this paper, we argue for a style of MT in which the focus of processing is at the level of the lexicon, rather than the grammar. This approach to translation allows an analyzer to map source sentences into an interlingual form, which th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

An MT Error-Driven Discriminative Word Lexicon using Sentence Structure Features

نویسندگان

چکیده

منابع مشابه

Maximum mutual information multi-phone units in direct modeling

Discriminant Models for Word Alignment

Feature extraction in opinion mining through Persian reviews

Unsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets

Lexicon-Driven Machine Translation

عنوان ژورنال:

اشتراک گذاری