External and Intrinsic Plagiarism Detection Using Vector Space Models

نویسندگان

  • Mario Zechner
  • Markus Muhr
  • Roman Kern
  • Michael Granitzer
چکیده

Plagiarism detection can be divided in external and intrinsic methods. Naive external plagiarism analysis suffers from computationally demanding full nearest neighbor searches within a reference corpus. We present a conceptually simple space partitioning approach to achieve search times sub linear in the number of reference documents, trading precision for speed. We focus on full duplicate searches while achieving acceptable results in the near duplicate case. Intrinsic plagiarism analysis tries to find plagiarized passages within a document without any external knowledge. We use several topic independent stylometric features from which a vector space model for each sentence of a suspicious document is constructed. Plagiarized passages are detected by an outlier analysis relative to the document mean vector. Our system was created for the first PAN competition on plagiarism detection in 2009. The evaluation was performed on the challenge’s development and competition corpora for which we report our results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

External & Intrinsic Plagiarism Detection: VSM & Discourse Markers based Approach - Notebook for PAN at CLEF 2011

This paper aims to explain the performance of plagiarism detection system which can detect External as well as Intrinsic Plagiarism in text. It reports the results on PAN-PC-2011 test corpus. We investigated Vector Space Model based techniques for detecting external plagiarism cases and discourse markers based features to detect intrinsic plagiarism cases.

متن کامل

Approaches for Intrinsic and External Plagiarism Detection - Notebook for PAN at CLEF 2011

Plagiarism detection has been considered as a classification problem which can be approximated with intrinsic strategies, considering self-based information from a given document, and external strategies, considering comparison techniques between a suspicious document and different sources. In this work, both intrinsic and external approaches for plagiarism detection are presented. First, the m...

متن کامل

Paragraph Clustering for Intrinsic Plagiarism Detection using a Stylistic Vector Space Model with Extrinsic Features

Our approach to the task of intrinsic plagiarism detection uses a vectorspace model which eschews surface features in favor of richer extrinsic features, including those based on latent semantic analysis in a larger external corpus. We posit that the popularity and success of surface n-gram features is mostly due to the topic-biased nature of current artificial evaluations, a problem which unfo...

متن کامل

Detection of Paraphrastic Cases of Mono-lingual and Cross-lingual Plagiarism

External plagiarism detection is a unique retrieval process where the algorithm has to provide an evidence of plagiarism if any for a suspicious section from the pool of source documents available. This paper focuses on paraphrasing involved in detection of plagiarism both from monolingual and cross-lingual aspect. In order to investigate the challenges in detection, we further analyse the perf...

متن کامل

External Plagiarism Detection

Here we describe our algorithm for detecting external plagiarism in PAN-10 competition. The algorithm has two steps 1. Identification of similar documents and the plagiarized section for a suspicious document with the source documents using Vector Space Model (VSM) and cosine similarity measure and 2. Identify the plagiarized area in the suspicious document using Chunk ratio.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009