Improving Statistical MT through Morphological Analysis
نویسندگان
چکیده
In statistical machine translation, estimating word-to-word alignment probabilities for the translation model can be difficult due to the problem of sparse data: most words in a given corpus occur at most a handful of times. With a highly inflected language such as Czech, this problem can be particularly severe. In addition, much of the morphological variation seen in Czech words is not reflected in either the morphology or syntax of a language like English. In this work, we show that using morphological analysis to modify the Czech input can improve a Czech-English machine translation system. We investigate several different methods of incorporating morphological information, and show that a system that combines these methods yields the best results. Our final system achieves a BLEU score of .333, as compared to .270 for the baseline word-to-word system.
منابع مشابه
Efficacy of Massage Therapy on Pain and Dysfunction in Patients with Neck Pain: A Systematic Review and Meta-Analysis
Objective. To systematically evaluate the evidence of whether massage therapy (MT) is effective for neck pain. Methods. Randomized controlled trials (RCTs) were identified through searches of 5 English and Chinese databases (to December 2012). The search terms included neck pain, neck disorders, cervical vertebrae, massage, manual therapy, Tuina, and random. In addition, we performed hand searc...
متن کاملStatistical MT Systems Revisited: How much Hybridity do they have?
The statistical approach to MT started about twenty-five years ago and has now been widely accepted as an alternative to the classical approach with manually designed rules. Among the attractive properties of the statistical approach is its capability to learn the translation models automatically from a (sufficiently) large amount of sourcetarget sentence pairs. Thus the need for the manual des...
متن کاملMTriage: Web-enabled Software for the Creation, Machine Translation, and Annotation of Smart Documents
Progress in the Machine Translation (MT) research community, particularly for statistical approaches, is intensely data-driven. Acquiring source language documents for testing, creating training datasets for customized MT lexicons, and building parallel corpora for MT evaluation require translators and non-native speaking analysts to handle large document collections. These collections are furt...
متن کاملUnsupervised Morphology Rivals Supervised Morphology for Arabic MT
If unsupervised morphological analyzers could approach the effectiveness of supervised ones, they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper, we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers, using a state-of-theart Arabic-to-English MT system. We apply maximum marginal de...
متن کاملImproving MT Quality: Towards a Hybrid MT Architecture in the linguatec 'Personal Translator'
This paper reports on measures to improve the quality of MT systems, by using a hybrid system architecture which adds corpus-based and statistical components to an existing rulebased system backbone. The focus is on improving the accuracy of the dictionary resources.
متن کامل