Evaluation of Text Reuse Corpora for Text Alignment Task of plagiarism Detection

نویسندگان

  • Vahid Zarrabi
  • Javad Rafiei
  • Khadijeh Khoshnava
  • Habibollah Asghari
  • Salar Mohtaj
چکیده

This paper addresses the text alignment task of 7th International competition on plagiarism detection; PAN 2015. We investigate five submitted corpora and evaluate them based on their characteristics in two ways: manual and automatic evaluation. The results of evaluation show that the most of plagiarism cases in prepared corporahavea rather high quality in term of “rate of obfuscation” alongside “preserving the concepts”.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Algorithms and Corpora for Persian Plagiarism Detection: Overview of PAN at FIRE 2016

The task of plagiarism detection is to find passages of text-reuse in a suspicious document. This task is of increasing relevance, since scholars around the world take advantage of the fact that information about nearly any subject can be found on the World Wide Web by reusing existing text instead of writing their own. We organized the Persian PlagDet shared task at PAN 2016 in an effort to pr...

متن کامل

PAN 2015 Shared Task on Plagiarism Detection: Evaluation of Corpora for Text Alignment: Notebook for PAN at CLEF 2015

In this paper we describe and evaluate the corpora submitted to the PAN 2015 shared task on plagiarism detection for text alignment. We received monoand cross-language corpora in the following languages (pairs): English, Persian, Chinese, and Urdu-English, English-Persian. We present an independent section for each submitted corpus including statistics, discussion of the obfuscation techniques ...

متن کامل

Developing Bilingual Plagiarism Detection Corpus Using Sentence Aligned Parallel Corpus: Notebook for PAN at CLEF 2015

Plagiarism detection is the process of locating text reuse within a suspicious document. The plagiarism detection corpora are used for evaluating plagiarism detection systems. In this paper, we present a bilingual PersianEnglish plagiarism detection corpus. We provide our corpus for the task of text alignment corpus construction in the PAN 2015 competition. Our approach is based on parallel cor...

متن کامل

Adaptive Algorithm for Plagiarism Detection: The Best-Performing Approach at PAN 2014 Text Alignment Competition

The task of (monolingual) text alignment consists in finding similar text fragments between two given documents. It has applications in plagiarism detection, detection of text reuse, author identification, authoring aid, and information retrieval, to mention only a few. We describe our approach to the text alignment subtask of the plagiarism detection competition at PAN 2014, which resulted in ...

متن کامل

Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015

The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015