Tailoring Word Alignments to Syntactic Machine Translation
نویسندگان
چکیده
Extracting tree transducer rules for syntactic MT systems can be hindered by word alignment errors that violate syntactic correspondences. We propose a novel model for unsupervised word alignment which explicitly takes into account target language constituent structure, while retaining the robustness and efficiency of the HMM alignment model. Our model’s predictions improve the yield of a tree transducer extraction system, without sacrificing alignment quality. We also discuss the impact of various posteriorbased methods of reconciling bidirectional alignments.
منابع مشابه
Improving Function Word Alignment with Frequency and Syntactic Information
In statistical word alignment for machine translation, function words usually cause poor aligning performance because they do not have clear correspondence between different languages. This paper proposes a novel approach to improve word alignment by pruning alignments of function words from an existing alignment model with high precision and recall. Based on monolingual and bilingual frequency...
متن کاملDiversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuri...
متن کاملUnsupervised Syntactic Alignment with Inversion Transduction Grammars
Syntactic machine translation systems currently use word alignments to infer syntactic correspondences between the source and target languages. Instead, we propose an unsupervised ITG alignment model that directly aligns syntactic structures. Our model aligns spans in a source sentence to nodes in a target parse tree. We show that our model produces syntactically consistent analyses where possi...
متن کاملUsing Syntax to Improve Word Alignment Precision for Syntax-Based Machine Translation
Word alignments that violate syntactic correspondences interfere with the extraction of string-to-tree transducer rules for syntaxbased machine translation. We present an algorithm for identifying and deleting incorrect word alignment links, using features of the extracted rules. We obtain gains in both alignment quality and translation quality in Chinese-English and Arabic-English translation ...
متن کاملSyntactic Constraints on Phrase Extraction for Phrase-Based Machine Translation
A typical phrase-based machine translation (PBMT) system uses phrase pairs extracted from word-aligned parallel corpora. All phrase pairs that are consistent with word alignments are collected. The resulting phrase table is very large and includes many non-syntactic phrases which may not be necessary. We propose to filter the phrase table based on source language syntactic constraints. Rather t...
متن کامل