Plagiarism Alignment Detection by Merging Context Seeds

نویسندگان

Philipp Gross

Pashutan Modaresi

چکیده

We describe our submitted algorithm to the text alignment sub-task of the plagiarism detection task in the PAN2014 challenge that achieved a plagdet score 0.855. By extracting contextual features for each document character and grouping those that are relevant for a given pair of documents, we generate seeds of atomic plagiarism cases. These are then merged by an agglomerative singlelinkage strategy using a defined distance measure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013

In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment...

متن کامل

Detecting High Obfuscation Plagiarism: Exploring Multi-Features Fusion via Machine Learning

Providing effective methods of identification of high-obfuscation plagiarism seeds presents a significant research problem in the field of plagiarism detection. The conventional methods of plagiarism detection are based on single type of features to capture plagiarism seeds. But for high-obfuscation plagiarism detection, these single type features are not sufficient for identifying the plagiari...

متن کامل

Optimized Fuzzy Text Alignment for Plagiarism Detection

This paper describes a method for plagiarism detection based on a fuzzy alignment between a given pair of documents. The proposed method assigns a weight to each word of the suspicious document according to the straightness of its alignment to the source document; this weight is used as a kind of plagiarism probability measure for each word of the suspicious document. The paper also presents a ...

متن کامل

A Text Alignment Corpus for Persian Plagiarism Detection

This paper describes how a Persian text alignment corpus was constructed to evaluate plagiarism detection systems. This corpus is in PAN format and contains 11,089 documents and more than 11,603 plagiarism cases. Efforts were made to simulate various types of plagiarism manually, semi-automatically, or automatically in this large-scale corpus.

متن کامل

Overview of the 6th International Competition on Plagiarism Detection

This paper overviews 17 plagiarism detectors that have been evaluated within the sixth international competition on plagiarism detection at PAN 2014. We report on their performances for the two tasks source retrieval and text alignment of external plagiarism detection. For the third year in a row, we invite software submissions instead of run submissions for this task, which allows for cross-ye...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Plagiarism Alignment Detection by Merging Context Seeds

نویسندگان

چکیده

منابع مشابه

Approaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013

Detecting High Obfuscation Plagiarism: Exploring Multi-Features Fusion via Machine Learning

Optimized Fuzzy Text Alignment for Plagiarism Detection

A Text Alignment Corpus for Persian Plagiarism Detection

Overview of the 6th International Competition on Plagiarism Detection

عنوان ژورنال:

اشتراک گذاری