Encoplot - Performance in the Second International Plagiarism Detection Challenge - Lab Report for PAN at CLEF 2010
نویسندگان
چکیده
Our submission this year is generated by the same method Encoplot that we have developed for the last year competition. There is a single improvement, we compare in addition each suspicious document with each other and flag the passages most probably in correspondence as intrinsic plagiarism.
منابع مشابه
Encoplot - Tuned for High Recall (also Proposing a New Plagiarism Detection Score)
This article describes the latest changes to our plagiarism detection system Encoplot. We have sent the modified system to the PAN@CLEF 2012 automatic detection of plagiarism challenge, where it ranked 2nd by the F-measure and 3rd by the “plagdet“ scoring method that we had previously shown to be flawed to some extent. The main changes have been done to the heuristic that tries to recognize the...
متن کاملImproving the Reliability of the Plagiarism Detection System - Lab Report for PAN at CLEF 2010
In this paper we describe our approach at the PAN 2010 plagiarism detection competition. We refer to the system we have used in PAN’09. We then present the improvements we have tried since the PAN’09 competition, and their impact on the results on the development corpus. We describe our experiments with intrinsic plagiarism detection and evaluate them. We then discuss the computational cost of ...
متن کاملA Cluster-Based Plagiarism Detection Method - Lab Report for PAN at CLEF 2010
In this paper we describe a cluster-based plagiarism detection method, which we have used in the learning management system of SCUT to detect plagiarism in the network engineering related courses. And we also used this method to detect external plagiarism in the PAN-10 competition. The method is divided into three steps: the first step, called pre-selecting, is to narrow the scope of detection ...
متن کاملFastDocode: Finding Approximated Segments of N-Grams for Document Copy Detection - Lab Report for PAN at CLEF 2010
Nowadays, plagiarism has been presented as one of the main distresses that the information technology revolution has lead into our society for which using pattern matching algorithms and intelligent data analysis approaches, these practices could be identified. Furthermore, a fast document copy detection algorithm could be used in large scale applications for plagiarism detection in academia, s...
متن کاملThe Encoplot Similarity Measure for Automatic Detection of Plagiarism - Notebook for PAN at CLEF 2011
This paper describes the evolution of our method Encoplot for automatic plagiarism detection and the results of the participation to the PAN’11 competition. The main novelties are the introduction of a new similarity measure and of a new ranking method, which cooperate to rank much better the source– suspicious document pairs when selecting the candidates for the detailed analysis phase. We hav...
متن کامل