An efficient and scalable plagiarism checking system using Bloom filters
نویسندگان
چکیده
With the easy access to the huge volume of articles available on the Internet, plagiarism is getting worse and worse. Most recent approaches proposed to address this problem usually focus on achieving better accuracy of similarity detection process. However, there are some real applications where plagiarized contents should be detected without revealing any information. Moreover, in such web-based applications, running time, memory consumption, communication and computational complexity should be also taken into account. In this paper, we propose a similar document detection system based on matrix Bloom filter, a new extension of standard Bloom filter. The experimental on a real dataset show that the system can achieve 98% of accuracy. We also compare our approach with a method recently proposed for the same purpose. The results of the comparison show that the Bloom filter-based approach achieves much better performance than other in terms of the aforementioned factors. 2014 Elsevier Ltd. All rights reserved.
منابع مشابه
A Cuckoo Filter Modification Inspired by Bloom Filter
Probabilistic data structures are so popular in membership queries, network applications, and so on. Bloom Filter and Cuckoo Filter are two popular space efficient models that incorporate in set membership checking part of many important protocols. They are compact representation of data that use hash functions to randomize a set of items. Being able to store more elements while keeping a reaso...
متن کاملBloom Filters in Probabilistic Verification
Probabilistic techniques for verification of finite-state transition systems offer huge memory savings over deterministic techniques. The two leading probabilistic schemes are hash compaction and the bitstate method, which stores states in a Bloom filter. Bloom filters have been criticized for being slow, inaccurate, and memory-inefficient, but in this paper, we show how to obtain Bloom filters...
متن کاملBloom Filters & Their Applications
A Bloom Filter (BF) is a data structure suitable for performing set membership queries very efficiently. A Standard Bloom Filter representing a set of n elements is generated by an array of m bits and uses k independent hash functions. Bloom Filters have some attractive properties including low storage requirement, fast membership checking and no false negatives. False positives are possible bu...
متن کاملReducing False Positives of a Bloom Filter using Cross-Checking Bloom Filters
A Bloom filter is a compact data structure that supports membership queries on a set, allowing false positives. The simplicity and the excellent performance of a Bloom filter make it a standard data structure of great use in many network applications. In reducing the false positive rate of a Bloom filter, it is well known that the size of a Bloom filter and accordingly the number of hash indice...
متن کاملScalable Bloom Filters
Bloom Filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability, being impossible to store extra elements without increasing the false positive probability. This leads typically to a conservative assumpti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computers & Electrical Engineering
دوره 40 شماره
صفحات -
تاریخ انتشار 2014