Feature Selection for Factored Phrase-Based Machine Translation
نویسنده
چکیده
In the presented work we investigate factored models for machine translation. We provide a thorough theoretical description of this machine translation paradigm. We describe a method for evaluating the complexity of factored models and verify its usefulness in practice. We present a software tool for automatic creation of machine translation experiments and search in the space of possible configurations. In the experimental part of the work we verify our analyses and give some insight into the potential of factored systems. We indicate some of the possible directions that lead to improvement in translation quality, however we conclude that it is not possible to explore these options in a fully automatic way.
منابع مشابه
Exploiting Linguistically-Enriched Models for Phrase-Based Statistical Machine Translation
This thesis presents the design and implementation of linguistically-informed models for statistical phrase-based machine translation. Using Koehn's Pharaoh (2004), a state-of-the-art SMT system, and Moses (Hoang, 2006), a variant of the former which supports factored translation models, we have investigated two approaches: Combined Feature Models and Factored Models. While Combined Feature Mod...
متن کاملSystem Description of BJTU-NLP SMT for NTCIR-9 PatentMT
This paper presents the overview of statistical machine translation systems that BJTU-NLP developed for the NTCIR-9 Patent Machine Translation Task (NTCIR-9 PatentMT). We compared the performance between phrase-based translation model and factored translation model in our Patent SMT of Chinese to English and English to Japanese. Factored translation model was proposed as an extended phrase-base...
متن کاملStatistical Translation Models: A Literature Survey
In this survey, we briefly study Phrase-based, Factored and Hierarchical translation models. First we learn basics of Phrase-based model. Then we get introduced to an interesting SMT approach called Factored translation models. We also study mathematical modeling of the Factored models. Finally, we compare Factored models with Phrase-based models and know their disadvantages which are pulling t...
متن کاملEnglish-Latvian SMT: knowledge or data?
In cases when phrase-based statistical machine translation (SMT) is applied to languages with rather free word order and rich morphology, translated texts often are not fluent due to misused inflectional forms and wrong word order between phrases or even inside the phrase. One of possible solutions how to improve translation quality is to apply factored models. The paper presents work on Englis...
متن کاملCCG Supertags in Factored Statistical Machine Translation
Combinatorial Categorial Grammar (CCG) supertags present phrase-based machine translation with an opportunity to access rich syntactic information at a word level. The challenge is incorporating this information into the translation process. Factored translation models allow the inclusion of supertags as a factor in the source or target language. We show that this results in an improvement in t...
متن کامل