Similarity Overlap Metric and Greedy String Tiling at PAN 2012: Plagiarism Detection

نویسنده

  • Arun kumar Jayapal
چکیده

This paper reports the best performed approach followed for the candidate document retrieval task and the approach used for the detailed comparison task of the Plagiarism detection track in PAN 2012. The aim of the participation was to understand a few of the computer-assisted approaches used for plagiarism detection. The plagiarism detection is dependent on two broad tasks, (1) the candidate document retrieval task and (2) the detailed comparison task. The N-gram similarity overlap metric was used for candidate document retrieval task and the greedy string tiling algorithm for detailed comparison task. The evaluation results suggested that the approach used for the candidate document retrieval task was highly competitive, but the approach used for detailed comparison task need much more improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

University of Sheffield - Lab Report for PAN at CLEF 2010

This paper describes the University of Sheffield entry for the 2nd international plagiarism detection competition (PAN 2010). Our system attempts to identify extrinsic plagiarism. A three-stage approach is used: pre-processing, candidate document selection (using word n-grams) and detailed analysis (using the Running Karp-Rabin Greedy String Tiling string matching algorithm). This approach achi...

متن کامل

External Plagiarism Detection using Information Retrieval and Sequence Alignment - Notebook for PAN at CLEF 2011

This paper describes the University of Sheffield entry for the 3rd International Competition on Plagiarism Detection which attempted the monolingual external plagiarism detection task. A three stage framework was used: preprocessing and indexing, candidate document selection (using an Information Retrieval based approach) and detailed analysis (using the Running Karp-Rabin Greedy String Tiling ...

متن کامل

Fuzzy Semantic-Based String Similarity for Extrinsic Plagiarism Detection - Lab Report for PAN at CLEF 2010

This report explains our plagiarism detection method using fuzzy semantic-based string similarity approach. The algorithm was developed through four main stages. First is pre-processing which includes tokenisation, stemming and stop words removing. Second is retrieving a list of candidate documents for each suspicious document using shingling and Jaccard coefficient. Suspicious documents are th...

متن کامل

A Textual-Based Similarity Approach for Efficient and Scalable External Plagiarism Analysis - Lab Report for PAN at CLEF 2010

In this paper we present an approach to detect external plagiarism based on textual similarity. This is an efficient and precise method that can be applied over large sets of documents. The system that we have developed contains a first phase of document selection that uses a variant of tf -idf applied over the terms that appear in the two documents of the pair being compared. After this is don...

متن کامل

Approaches for Candidate Document Retrieval and Detailed Comparison of Plagiarism Detection

In this paper we report on our plagiarism detection system which is used to process the PAN plagiarism corpus for the tasks of Candidate Document Retrieval and Detailed Comparison. To retrieve the plagiarism candidate document by using ChatNoir API, a method based on tf*idf to extract the keywords of suspicious documents as queries is proposed. An Lucene ranking method is used for plagiarism ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012