Measuring Machine Translation Errors in New Domains
نویسندگان
چکیده
We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macro-level analysis that measures how domain shift affects corpus-level evaluation; the second is a microlevel analysis for word-level errors. We apply these methods to understand what happens when a Parliament-trained phrase-based machine translation system is applied in four very different domains: news, medical texts, scientific articles and movie subtitles. We present quantitative and qualitative experiments that highlight opportunities for future research in domain adaptation for machine translation.
منابع مشابه
A Comparative Study of English-Persian Translation of Neural Google Translation
Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...
متن کاملMeasuring the Impact of Spelling Errors on the Quality of Machine Translation
In this paper we show how different types of spelling errors influence the quality of machine translation. We also propose a method to evaluate the impact of spelling errors correction on translation quality without expensive manual work of providing reference translations.
متن کاملThe UU Submission to the Machine Translation Quality Estimation Task
This paper outlines the UU-SVM system for Task 1 of the WMT16 Shared Task in Quality Estimation. Our system uses Support Vector Machine Regression to investigate the impact of a series of features aiming to convey translation quality. We propose novel features measuring reordering and noun translation errors. We show that we can outperform the baseline when we combine it with a subset of our ne...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملThe Impact of Morphological Errors in Phrase-based Statistical Machine Translation from English and German into Swedish
We have investigated the potential for improvement in target language morphology when translating into Swedish from English and German, by measuring the errors made by a state of the art phrase-based statistical machine translation system. Our results show that there is indeed a performance gap to be filled by better modelling of inflectional morphology and compounding; and that the gap is not ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- TACL
دوره 1 شماره
صفحات -
تاریخ انتشار 2013