Document-level Consistency Verification in Machine Translation
نویسندگان
چکیده
Translation consistency is an important issue in document-level translation. However, the consistency in Machine Translation (MT) output is generally overlooked in most MT systems due to the lack of the use of document contexts. To address this issue, we present a simple and effective approach that incorporates document contexts into an existing Statistical Machine Translation (SMT) system for document-level translation. Experimental results show that our approach effectively reduces the errors caused by inconsistent translations (25% error reduction). More interestingly, it is observed that as a “bonus” our approach is able to improve the BLEU score of the SMT system.
منابع مشابه
Using Word Embeddings to Enforce Document-Level Lexical Consistency in Machine Translation
We integrate newmechanisms in a document-level machine translation decoder to improve the lexical consistency of document translations. First, we develop a document-level feature designed to score the lexical consistency of a translation. This feature, which applies towords that have been translated into different forms within the document, uses word embeddings to measure the adequacy of each w...
متن کاملDocument-Level Machine Translation Evaluation with Gist Consistency and Text Cohesion
Current Statistical Machine Translation (SMT) is significantly affected by Machine Translation (MT) evaluation metric. Nowadays the emergence of document-level MT research increases the demand for corresponding evaluation metric. This paper proposes two superior yet low-cost quantitative objective methods to enhance traditional MT metric by modeling document-level phenomena from the perspective...
متن کاملNovel Document Level Features for Statistical Machine Translation
In this paper, we introduce document level features that capture necessary information to help MT system perform better word sense disambiguation in the translation process. We describe enhancements to a Maximum Entropy based translation model, utilizing long distance contextual features identified from the span of entire document and from both source and target sides, to improve the likelihood...
متن کاملDocument-level Re-ranking with Soft Lexical and Semantic Features for Statistical Machine Translation
We introduce two document-level features to polish baseline sentence-level translations generated by a state-of-the-art statistical machine translation (SMT) system. One feature uses the word-embedding technique to model the relation between a sentence and its context on the target side; the other feature is a crisp document-level token-type ratio of target-side translations for source-side wor...
متن کاملModeling Term Translation for Document-informed Machine Translation
Term translation is of great importance for statistical machine translation (SMT), especially document-informed SMT. In this paper, we investigate three issues of term translation in the context of documentinformed SMT and propose three corresponding models: (a) a term translation disambiguation model which selects desirable translations for terms in the source language with domain information,...
متن کامل