Counting Co-occurrences in Citations to Identify Plagiarised Text Fragments
نویسندگان
چکیده
Research in external plagiarism detection is mainly concerned with the comparison of the textual contents of a suspicious document against the contents of a collection of original documents. More recently, methods that try to detect plagiarism based on citation patterns have been proposed. These methods are particularly useful for detecting plagiarism in scientific publications. In this work, we assess the value of identifying co-occurrences in citations by checking whether this method can identify cases of plagiarism in a dataset of scientific papers. Our results show that most the cases in which co-occurrences were found indeed correspond to plagiarised passages.
منابع مشابه
On Automatic Plagiarism Detection Based on n-Grams Comparison
When automatic plagiarism detection is carried out considering a reference corpus, a suspicious text is compared to a set of original documents in order to relate the plagiarised text fragments to their potential source. One of the biggest difficulties in this task is to locate plagiarised fragments that have been modified (by rewording, insertion or deletion, for example) from the source text....
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملHierarchical Topic Structuring: From Dense Segmentation to Topically Focused Fragments via Burst Analysis
Topic segmentation traditionally relies on lexical cohesion measured through word re-occurrences to output a dense segmentation, either linear or hierarchical. In this paper, a novel organization of the topical structure of textual content is proposed. Rather than searching for topic shifts to yield dense segmentation, we propose an algorithm to extract topically focused fragments organized in ...
متن کاملDistilling Conceptual Connections from MeSH Co-Occurrences
Our aim is to contribute to biomedical text extraction and mining research. In this paper we present exploratory research on the MeSH terms assigned to MEDLINE citations. We analyze MeSH based co-occurrences and identify the interesting ones, i.e., those that are likely to be semantically meaningful. For each selected co-occurring pair we derive a weighted vector representation that emphasizes ...
متن کاملThe complexity of counting poset and permutation patterns
We introduce a notion of pattern occurrence that generalizes both classical permutation patterns as well as poset containment. Many questions about pattern statistics and avoidance generalize naturally to this setting, and we focus on functional complexity problems – particularly those that arise by constraining the order dimensions of the pattern and text posets. We show that counting the numb...
متن کامل