English-Persian Plagiarism Detection based on a Semantic Approach
Authors
Abstract:
Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-lingual plagiarism. In cross-lingual translation, writers meld a translation with their own words and ideas. Based on monolingual plagiarism detection methods, this paper ultimately intends to find a way to detect cross-lingual plagiarism. A framework called Multi-Lingual Plagiarism Detection (MLPD) has been presented for cross-lingual plagiarism analysis with ultimate objective of detection of plagiarism cases. English is the reference language and Persian materials are back translated using translation tools. The data for assessment of MLPD were obtained from English-Persian Mizan parallel corpus. Apache’s Solr was also applied to record the creep of the documents and their indexation. The accuracy mean of the proposed method revealed to be 98.82% when employing highly accurate translation tools which indicate the high accuracy of the proposed method. Also, Google translation service showed the accuracy mean to be 56.9%. These tests demonstrate that improved translation tools enhance the accuracy of the proposed method.
similar resources
A Plagiarism Detection Approach Based on SVM for Persian Texts
Plagiarism is defined as an unauthorized act of using or adapting others’ works and ideas without referring to them. Numerous methods have been proposed to detect plagiarism in different languages; however, not a lot has been accomplished in Persian. The present study has utilized statistical and semantic features to determine the functionality of Support Vector Machines (SVMs) in detecting act...
full textA Deep Learning Approach to Persian Plagiarism Detection
Plagiarism detection is defined as automatic identification of reused text materials. General availability of the internet and easy access to textual information enhances the need for automated plagiarism detection. In this regard, different algorithms have been proposed to perform the task of plagiarism detection in text documents. Due to drawbacks and inefficiency of traditional methods and l...
full textExternal Plagiarism Detection based on Human Behaviors in Producing Paraphrases of Sentences in English and Persian Languages
With the advent of the internet and easy access to digital libraries, plagiarism has become a major issue. Applying search engines is one of the plagiarism detection techniques that converts plagiarism patterns to search queries. Generating suitable queries is the heart of this technique and existing methods suffer from lack of producing accurate queries, Precision and Speed of retrieved result...
full textA Novel Approach for Plagiarism Detection in English Text
Digitalization provides text easily available on web interrelated to several academic areas. So it becomes a serious problem for academic enterprises or institutes. This paper presents Plagiarism detection system for the English language. Digital World provides text easily available on web interrelated to several academic areas. So it becomes a serious problem for academic enterprises or instit...
full textPlagiarism Detection Based on a Novel Trie-based Approach
Nowadays, plagiarism detection becomes as one of major problems in the text mining field. New coming technologies have made plagiarisation easy and more feasible. Therefore, it is vital to develop automatic system to detect plagiarisation in different contents. In this paper, we propose a trie to compare source and suspicious text documents. We use PersianPlagDet text documents as a case study....
full textPersian Plagiarism Detection Using Sentence Correlations
This report explains our Persian plagiarism detection system which we used to submit our run to Persian PlagDet competition at FIRE 2016. The system was constructed through four main stages. First is pre-processing and tokenization. Second is constructing a corpus of sentences from combination of source and suspicious document pair. Each sentence considered to be a document and represented as a...
full textMy Resources
Journal title
volume 5 issue 2
pages 275- 284
publication date 2017-07-01
By following a journal you will be notified via email when a new issue of this journal is published.
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023