A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
نویسندگان
چکیده
In this work we propose methods to label probabilistic synchronous context-free grammar (PSCFG) rules using only word tags, generated by either part-of-speech analysis or unsupervised word class induction. The proposals range from simple tag-combination schemes to a phrase clustering model that can incorporate an arbitrary number of features. Our models improve translation quality over the single generic label approach of Chiang (2005) and perform on par with the syntactically motivated approach from Zollmann and Venugopal (2006) on the NIST large Chineseto-English translation task. These results persist when using automatically learned word tags, suggesting broad applicability of our technique across diverse language pairs for which syntactic resources are not available.
منابع مشابه
A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
In this work we propose an approach to label probabilistic synchronous context-free grammar (PSCFG) rules using only word tags, generated by either part-of-speech analysis or unsupervised word class induction. Our approach improves translation quality over the single generic label approach of Chiang (2005) on the NIST large resource Chinese-toEnglish translation task. These improvements persist...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملA Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT
Probabilistic synchronous context-free grammar (PSCFG) translation models define weighted transduction rules that represent translation and reordering operations via nonterminal symbols. In this work, we investigate the source of the improvements in translation quality reported when using two PSCFG translation models (hierarchical and syntax-augmented), when extending a state-of-the-art phraseb...
متن کاملمدل ترجمه عبارت-مرزی با استفاده از برچسبهای کمعمق نحوی
Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...
متن کاملNew Parameterizations and Features for PSCFG-Based Machine Translation
We propose several improvements to the hierarchical phrase-based MT model of Chiang (2005) and its syntax-based extension by Zollmann and Venugopal (2006). We add a source-span variance model that, for each rule utilized in a probabilistic synchronous context-free grammar (PSCFG) derivation, gives a confidence estimate in the rule based on the number of source words spanned by the rule and its ...
متن کامل