An MT Error-Driven Discriminative Word Lexicon using Sentence Structure Features
نویسندگان
چکیده
The Discriminative Word Lexicon (DWL) is a maximum-entropy model that predicts the target word probability given the source sentence words. We present two ways to extend a DWL to improve its ability to model the word translation probability in a phrase-based machine translation (PBMT) system. While DWLs are able to model the global source information, they ignore the structure of the source and target sentence. We propose to include this structure by modeling the source sentence as a bag-of-n-grams and features depending on the surrounding target words. Furthermore, as the standard DWL does not get any feedback from the MT system, we change the DWL training process to explicitly focus on addressing MT errors. By using these methods we are able to improve the translation performance by up to 0.8 BLEU points compared to a system that uses a standard DWL.
منابع مشابه
Maximum mutual information multi-phone units in direct modeling
This paper introduces a class of discriminative features for use in maximum entropy speech recognition models. The features we propose are acoustic detectors for discriminatively determined multi-phone units. The multi-phone units are found by computing the mutual information between the phonetic subsequences that occur in the training lexicon, and the word labels. This quantity is a function o...
متن کاملDiscriminant Models for Word Alignment
Word alignment aims to link each word of a translated sentence to its related words in the source sentence. Nowadays, Giza++ is the most used word alignment system. This toolkit implements the generative IBM models. Despite its popularity, several limitations remain. We thus propose to address this task using discriminative models (Maximum Entropy and Conditional Random Fields) which can easily...
متن کاملFeature extraction in opinion mining through Persian reviews
Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...
متن کاملUnsupervised Discriminative Language Model Training for Machine Translation using Simulated Confusion Sets
An unsupervised discriminative training procedure is proposed for estimating a language model (LM) for machine translation (MT). An English-to-English synchronous context-free grammar is derived from a baseline MT system to capture translation alternatives: pairs of words, phrases or other sentence fragments that potentially compete to be the translation of the same source-language fragment. Us...
متن کاملLexicon-Driven Machine Translation
Machine Translation (MT) systems have historically relied upon explicit grammars in order to analyze the source text and reproduce it in the target language. In this paper, we argue for a style of MT in which the focus of processing is at the level of the lexicon, rather than the grammar. This approach to translation allows an analyzer to map source sentences into an interlingual form, which th...
متن کامل