Unsupervised Ranking for Plagiarism Source Retrieval Notebook for PAN at CLEF 2013
نویسندگان
چکیده
The source retrieval task for plagiarism detection involves the use of a search engine to retrieve candidate sources of plagiarism for a suspicious document and provides a way to efficiently identify candidate documents so that more accurate comparisons can take place. We describe a strategy for source retrieval that makes use of an unsupervised ranking method to rank the results returned by a search engine by their similarity with the query document and that only retrieves documents that are likely to be sources of plagiarism. Evaluation shows the performance of our approach, which achieved the highest F1 score (0.47) among all task participants.
منابع مشابه
Diverse Queries and Feature Type Selection for Plagiarism Discovery Notebook for PAN at CLEF 2013
This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the ...
متن کاملUsing Statistic and Semantic Analysis to Detect Plagiarism Notebook for PAN at CLEF 2013
This paper describes an approach submitted to the 2013 PAN competiton for the source retrieval sub-task. Three different methods for extracting queries were used, which employed tf-idf, noun phrases and named entities, in order to submit very different queries and maximize recall.
متن کاملApproaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013
In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment...
متن کاملImproving Synoptic Quering for Source Retrieval: Notebook for PAN at CLEF 2015
Source retrieval is a part of a plagiarism discovery process, where only a selected set of candidate documents is retrieved from a large corpus of potential source documents and passed for detailed document comparison in order to highlight potential plagiarism. This paper describes a used methodology and the architecture of a source retrieval system, developed for PAN 2015 lab on uncovering pla...
متن کاملGuess Again and See if They Line up: Surrey's Runs at Plagiarism Detection Notebook for PAN at CLEF 2013
This paper briefly describes the approaches taken to the two subtasks of Source Retrieval and Text Alignment, in the Plagiarism Detection track at PAN 13. For the first of these, we reuse our PAN12 approach – which combines frequency and a contrastive corpus measure to select keywords for querying the ChatNoir search system; for the second, we reuse software that had previously featured in PAN1...
متن کامل