Automatic Evaluation Metric for Machine Translation that is Independent of Sentence Length
نویسندگان
چکیده
We propose new automatic evaluation metric to evaluate machine translation. Different from most similar metrics, our proposed metric does not depend heavily on sentence length. In most metrics based on f-measure comparisons of reference and candidate translations, the relative weight of each mismatched word in short sentences is larger than it in long sentences. Therefore, the evaluation score becomes disproportionally low in short sentences even when only one non-matching word exists. In our metric, the weight of each mismatched word is kept small even in short sentences. We designate our metric as Automatic Evaluation Metric that is Independent of Sentence Length (AILE). Experimental results indicate that AILE has the highest correlation with human judgments among some leading metrics.
منابع مشابه
Application of Prize based on Sentence Length in Chunk-based Automatic Evaluation of Machine Translation
As described in this paper, we propose a new automatic evaluation metric for machine translation. Our metric is based on chunking between the reference and candidate translation. Moreover, we apply a prize based on sentence-length to the metric, dissimilar from penalties in BLEU or NIST. We designate this metric as Automatic Evaluation of Machine Translation in which the Prize is Applied to a C...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملMAXSIM: An Automatic Metric for Machine Translation Evaluation Based on Maximum Similarity
This paper describes our participation in the NIST 2008 MetricsMATR Challenge, using our recently proposed automatic machine translation evaluation metric MAXSIM. The metric calculates a similarity score between a pair of English system-reference sentences by comparing information items such as ngrams across the sentence pair. Unlike most metrics, MAXSIM computes a similarity score between item...
متن کاملMAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation
We propose an automatic machine translation (MT) evaluation metric that calculates a similarity score (based on precision and recall) of a pair of sentences. Unlike most metrics, we compute a similarity score between items across the two sentences. We then find a maximum weight matching between the items such that each item in one sentence is mapped to at most one item in the other sentence. Th...
متن کاملUsing Contextual Information for Machine Translation Evaluation
Automatic evaluation of Machine Translation (MT) is typically approached by measuring similarity between the candidate MT and a human reference translation. An important limitation of existing evaluation systems is that they are unable to distinguish candidate-reference differences that arise due to acceptable linguistic variation from the differences induced by MT errors. In this paper we pres...
متن کامل