Diverse Queries and Feature Type Selection for Plagiarism Discovery Notebook for PAN at CLEF 2013

نویسندگان

  • Simon Suchomel
  • Jan Kasprzak
  • Michal Brandejs
چکیده

This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the Detailed Comparison task, we discuss feature type selection and global postprocessing. Resulting performance is significantly better with the described modifications, and further improvement is still possible.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Synoptic Quering for Source Retrieval: Notebook for PAN at CLEF 2015

Source retrieval is a part of a plagiarism discovery process, where only a selected set of candidate documents is retrieved from a large corpus of potential source documents and passed for detailed document comparison in order to highlight potential plagiarism. This paper describes a used methodology and the architecture of a source retrieval system, developed for PAN 2015 lab on uncovering pla...

متن کامل

Using Statistic and Semantic Analysis to Detect Plagiarism Notebook for PAN at CLEF 2013

This paper describes an approach submitted to the 2013 PAN competiton for the source retrieval sub-task. Three different methods for extracting queries were used, which employed tf-idf, noun phrases and named entities, in order to submit very different queries and maximize recall.

متن کامل

Plagiarism Candidate Retrieval Using Selective Query Formulation and Discriminative Query Scoring Notebook for PAN at CLEF 2013

This paper details the approach of implementing an English plagiarism source retrieval system to be presented at PAN 2013. The system uses the TextTiling algorithm to break a given document into segments that are centered around certain topics within the document. From these segments, keyphrases are generated using the KPMiner keyphrase extraction system. These keyphrases and segments are then ...

متن کامل

Approaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013

In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment...

متن کامل

Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015

The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013