Document-level translation quality estimation: exploring dicsourse an pseudo-references
نویسندگان
چکیده
Predicting the quality of machine translations is a challenging topic. Quality estimation (QE) of translations is based on features of the source and target texts (without the need for human references), and on supervised machine learning methods to build prediction models. Engineering well-performing features is therefore crucial in QE modelling. Several features have been used so far, but they tend to explore very short contexts within sentence boundaries. In addition, most work has targeted sentence-level quality prediction. In this paper, we focus on documentlevel QE using novel discursive features, as well as exploiting pseudo-reference translations. Experiments with features extracted from pseudo-references led to the best results, but the discursive features also proved promising.
منابع مشابه
Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation
In this paper we analyse the use of popular automatic machine translation evaluation metrics to provide labels for quality estimation at document and paragraph levels. We highlight crucial limitations of such metrics for this task, mainly the fact that they disregard the discourse structure of the texts. To better understand these limitations, we designed experiments with human annotators and p...
متن کاملRegression for Sentence-Level MT Evaluation with Pseudo References
Many automatic evaluation metrics for machine translation (MT) rely on making comparisons to human translations, a resource that may not always be available. We present a method for developing sentence-level MT evaluation metrics that do not directly rely on human reference translations. Our metrics are developed using regression learning and are based on a set of weaker indicators of fluency a...
متن کاملCombining Quality Prediction and System Selection for Improved Automatic Translation Output
This paper presents techniques for referencefree, automatic prediction of Machine Translation output quality at both sentenceand document-level. In addition to helping with document-level quality estimation, sentencelevel predictions are used for system selection, improving the quality of the output translations. We present three system selection techniques and perform evaluations that quantify...
متن کاملتخمین اطمینان خروجی ترجمه ماشینی با استفاده از ویژگی های جدید ساختاری و محتوایی
Despite machine translation (MT) wide suc-cess over last years, this technology is still not able to exactly translate text so that except for some language pairs in certain domains, post editing its output may take longer time than human translation. Nevertheless by having an estimation of the output quality, users can manage imperfection of this tech-nology. It means we need to estimate the c...
متن کاملAutomatic Detection of Machine Translated Text and Translation Quality Estimation
We show that it is possible to automatically detect machine translated text at sentence level from monolingual corpora, using text classification methods. We show further that the accuracy with which a learned classifier can detect text as machine translated is strongly correlated with the translation quality of the machine translation system that generated it. Finally, we offer a generic machi...
متن کامل