Using Syntax to Improve Word Alignment Precision for Syntax-Based Machine Translation
نویسندگان
چکیده
Word alignments that violate syntactic correspondences interfere with the extraction of string-to-tree transducer rules for syntaxbased machine translation. We present an algorithm for identifying and deleting incorrect word alignment links, using features of the extracted rules. We obtain gains in both alignment quality and translation quality in Chinese-English and Arabic-English translation experiments relative to a GIZA++ union baseline.
منابع مشابه
Improving Word Alignment Using Syntactic Dependencies
We introduce a word alignment framework that facilitates the incorporation of syntax encoded in bilingual dependency tree pairs. Our model consists of two sub-models: an anchor word alignment model which aims to find a set of high-precision anchor links and a syntaxenhanced word alignment model which focuses on aligning the remaining words relying on dependency information invoked by the acquir...
متن کاملRe-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation
This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-theart syntax MT system: re-structuring changes the syntactic structure of training parse trees to enable reuse of...
متن کاملمدل ترجمه عبارت-مرزی با استفاده از برچسبهای کمعمق نحوی
Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...
متن کاملA Bayesian Model of Syntax-Directed Tree to String Grammar Induction
Tree based translation models are a compelling means of integrating linguistic information into machine translation. Syntax can inform lexical selection and reordering choices and thereby improve translation quality. Research to date has focussed primarily on decoding with such models, but less on the difficult problem of inducing the bilingual grammar from data. We propose a generative Bayesia...
متن کاملA New Subtree-Transfer Approach to Syntax-Based Reordering for Statistical Machine Translation
In this paper we address the problem of translating between languages with word order disparity. The idea of augmenting statistical machine translation (SMT) by using a syntax-based reordering step prior to translation, proposed in recent years, has been quite successful in improving translation quality. We present a new technique for extracting syntax-based reordering rules, which are derived ...
متن کامل