Bloom filters

نویسنده

  • Arpita Korwar
چکیده

Bloom filters are used for answering queries on set membership. In this data structure, the whole element is not stored at the hashed address. Only a few bits are set in an array. Given a set S of cardinality n, we store it in an array of m bits using k hash functions h1(), . . . , hk(). Initially, all the cells in the array are set to 0. Then, for each element in the set, x ∈ S, for each 1 ≤ i ≤ k, we set hi(x) = 1. If an element is already 1, then it is not modified. To lookup an element z, the k locations h1(z), h2(z), . . . hk(z) are checked. If atleast one of the locations has a 0, then z does not belong to the set and false is returned. If all the locations contain a 1, then true is returned. But the element may still not belong to the set S. This situation is known as false positive. What about the universality of the hash functions? The hash functions need to be (n+ 1)k-universal. The hash addresses of the elements in the set and one element that is looked up should be independent. Given a constant bits per element ratio mn = c, we would like to reduce the probability of a false positive. It can be shown that the minimum probability of false positive occurs when the fraction of 0’s in the array is 12 and the number of hash functions k = mn ln 2. Then, the probability of a false positive is (0.6185) m n . The analysis for this can be found in the book by Mitzenmacher [MU05, Section 5.5.3]. This allows for a tradeoff between the space efficiency and the probability of a false positive.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Cuckoo Filter Modification Inspired by Bloom Filter

Probabilistic data structures are so popular in membership queries, network applications, and so on. Bloom Filter and Cuckoo Filter are two popular space efficient models that incorporate in set membership checking part of many important protocols. They are compact representation of data that use hash functions to randomize a set of items. Being able to store more elements while keeping a reaso...

متن کامل

An Approximate Duplicate-Elimination in RFID Data Streams Based on d-Left Time Bloom Filter

Article history: Received 6 March 2010 Received in revised form 16 July 2011 Accepted 18 July 2011 Available online 31 July 2011 The RFID technology has been applied to a wide range of areas since it does not require contact in detecting RFID tags. However, due to the multiple readings in many cases in detecting an RFID tag and the deployment of multiple readers, RFID data contains many duplica...

متن کامل

Bloofi: Multidimensional Bloom Filters

Bloom filters are probabilistic data structures commonly used for approximate membership problems in many areas of Computer Science (networking, distributed systems, databases, etc.). With the increase in data size and distribution of data, problems arise where a large number of Bloom filters are available, and all them need to be searched for potential matches. As an example, in a federated cl...

متن کامل

Reducing False Positives of a Bloom Filter using Cross-Checking Bloom Filters

A Bloom filter is a compact data structure that supports membership queries on a set, allowing false positives. The simplicity and the excellent performance of a Bloom filter make it a standard data structure of great use in many network applications. In reducing the false positive rate of a Bloom filter, it is well known that the size of a Bloom filter and accordingly the number of hash indice...

متن کامل

Optimizing Learned Bloom Filters by Sandwiching

We provide a simple method for improving the performance of the recently introduced learned Bloom filters, by showing that they perform better when the learned function is sandwiched between two Bloom filters.

متن کامل

A Model for Learned Bloom Filters and Related Structures

Recent work has suggested enhancing Bloom filters by using a pre-filter, based on applying machine learning to model the data set the Bloom filter is meant to represent. Here we model such learned Bloom filters, clarifying what guarantees can and cannot be associated with such a structure.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010