Design a Persian Automated Plagiarism Detector (AMZPPD)

نویسندگان

  • Maryam Mahmoodi
  • Mohammad Mahmoodi Varnamkhasti
چکیده

Currently there are lots of plagiarism detection approaches. But few of them implemented and adapted for Persian languages. In this paper, our work on designing and implementation of a plagiarism detection system based on preprocessing and NLP technics will be described. And the results of testing on a corpus will be presented. Keywords— External Plagiarism, Plagiarism, Copy detection, natural language processing, Artificial intelligence , Persian language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Deep Learning Approach to Persian Plagiarism Detection

Plagiarism detection is defined as automatic identification of reused text materials. General availability of the internet and easy access to textual information enhances the need for automated plagiarism detection. In this regard, different algorithms have been proposed to perform the task of plagiarism detection in text documents. Due to drawbacks and inefficiency of traditional methods and l...

متن کامل

Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015

The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...

متن کامل

A theoretical basis to the automated detection of copying between texts, and its practical implementation in the Ferret plagiarism and collusion detector

The theoretical background to the automated detection of plagiarism and collusion is investigated in this paper. We examine the underlying concepts, and see how features of language can be exploited to produce an effective system. Independently written texts have markedly different characteristics to those that include passages that have been fully or partially copied, and they can be effective...

متن کامل

A Text Alignment Corpus for Persian Plagiarism Detection

This paper describes how a Persian text alignment corpus was constructed to evaluate plagiarism detection systems. This corpus is in PAN format and contains 11,089 documents and more than 11,603 plagiarism cases. Efforts were made to simulate various types of plagiarism manually, semi-automatically, or automatically in this large-scale corpus.

متن کامل

Mahak Samim: A Corpus of Persian Academic Texts for Evaluating Plagiarism Detection Systems

In this paper we introduce Mahak Samim, a plagiarism detection corpus that consists of Persian academic texts in which plagiarism cases are embedded. This corpus, which can be used for evaluating plagiarism detection systems, consists of more than five thousand artificial plagiarism cases with various lengths and diverse degrees of obfuscation. The development process and the features of the co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1403.1618  شماره 

صفحات  -

تاریخ انتشار 2014