Machine Translation Evaluation Metric for Text Alignment
نویسندگان
چکیده
As plagiarisers become cleverer, plagiarism detection becomes harder. Plagiarisers will find new ways to obfuscate the plagiarized passages so that humans and automatic plagiarism detectors are not able to point them out. So, a plagiarism detection system needs to be robust enough to detect plagiarism, no matter what obfuscation techniques have been applied. Our system attempts to do the same by combining two different methods, one stricter to catch mildly obfuscated passages and one more lenient to catch difficult ones. We use a machine translation evaluation metric and n-gram matching to detect overlaps between the source and suspicious documents. On the PAN’14 corpus, which contains data with various types of obfuscation to mimic human plagiarisers, we obtained plagdet scores of 0.84404 and 0.86806 on the two datasets.
منابع مشابه
HEVAL: Yet Another Human Evaluation Metric
Machine translation evaluation is a very important activity in machine translation development. Automatic evaluation metrics proposed in literature are inadequate as they require one or more human reference translations to compare them with output produced by machine translation. This does not always give accurate results as a text can have several different translations. Human evaluation metri...
متن کاملMeteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems
This paper describes Meteor 1.3, our submission to the 2011 EMNLP Workshop on Statistical Machine Translation automatic evaluation metric tasks. New metric features include improved text normalization, higher-precision paraphrase matching, and discrimination between content and function words. We include Ranking and Adequacy versions of the metric shown to have high correlation with human judgm...
متن کاملUsing Tectogrammatical Alignment in Phrase-Based Machine Translation
In this paper, we describe an experiment whose goal is to improve the quality of machine translation. Phrase-based machine translation, which is the state-of-the-art in the field of statistical machine translation, learns its phrase tables from large parallel corpora, which have to be aligned on the word level. The most common word-alignment tool is GIZA++. It is very universal and language ind...
متن کاملAssessing the Accuracy of Discourse Connective Translations: Validation of an Automatic Metric
Automatic metrics for the evaluation of machine translation (MT) compute scores that characterize globally certain aspects of MT quality such as adequacy and fluency. This paper introduces a reference-based metric that is focused on a particular class of function words, namely discourse connectives, of particular importance for text structuring, and rather challenging for MT. To measure the acc...
متن کاملDocument-Level Machine Translation Evaluation with Gist Consistency and Text Cohesion
Current Statistical Machine Translation (SMT) is significantly affected by Machine Translation (MT) evaluation metric. Nowadays the emergence of document-level MT research increases the demand for corresponding evaluation metric. This paper proposes two superior yet low-cost quantitative objective methods to enhance traditional MT metric by modeling document-level phenomena from the perspective...
متن کامل