RDI System for Extrinsic Plagiarism Detection (RDI_RED), Working Notes for PANAraPlagDet at FIRE 2015

نویسندگان

  • Ahmed Magooda
  • Ashraf Y. Mahgoub
  • Mohsen Rashwan
  • Magda B. Fayek
  • Hazem M. Raafat
چکیده

Extrinsic plagiarism detection gathered the attention of many researchers lately. Plagiarism process began to be more and more difficult to be detected due to appearance of other sophisticated plagiarism approaches other than direct copy and paste such as (phrase rephrasing, word shuffling, semantic substitution, etc...). In this paper, we present RDI system for extrinsic plagiarism detection (RDI_RED). RDI_RED system performs remarkably on a wide spectrum of plagiarism techniques starting from simple copy-paste to word shuffling and also complete sentence rephrasing. RDI_RED system achieved the first three positions in Arabic language plagiarism detection competition with a Plagdet (Plagiarism Detection score) of 80% which is 20% higher than the base line and 18% higher than the second best competing system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RDI System for Intrinsic Plagiarism Detection (RDI_RID), Working Notes for PANAraPlagDet at FIRE 2015

Many researchers have been investigating the task of plagiarism detection lately. In this paper we present RDI system for intrinsic plagiarism detection (RDI_RID). RDI_RID system was the only system that participated in intrinsic track of the Arabic language plagiarism detection competition. RDI_RID system achieved a PlagDet (Plagiarism Detection score) of 19% compared to 38% achieved by the ba...

متن کامل

Arabic Plagiarism Detection Using Word Correlation in N-Grams with K-Overlapping Approach, Working Notes for PAN-AraPlagDet at FIRE 2015

This report explains our Arabic plagiarism detection system which we used to submit our run to AraPlagDetect competition at FIRE 2015. The system was constructed through four main stages. First is pre-processing which includes tokenisation and stop words removing. Second is retrieving a list of candidate documents for each suspicious document using K-gram fingerprinting and Jaccard coefficient....

متن کامل

Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015

The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...

متن کامل

Overview of the AraPlagDet PAN@FIRE2015 Shared Task on Arabic Plagiarism Detection

AraPlagDet is the first shared task that addresses the evaluation of plagiarism detection methods for Arabic texts. It has two subtasks, namely external plagiarism detection and intrinsic plagiarism detection. A total of 8 runs have been submitted and tested on the standardized corpora developed for the track. This overview paper describes these evaluation corpora, discusses the participants’ m...

متن کامل

Normalization based Stop-Word approach to Source Code Plagiarism Detection

This paper is a report of PES Institute of Technology’s participation in the Cross Language Detection of Source Code Reuse (CL-SOCO) task at FIRE 2015 [1]. We approach this task as text document plagiarism task, without considering formal programming language grammatical structure. We use normalization of commonly used identifiers to detect pair of programs which have the same objective. We als...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015