A Bloom filter based semi-index on q-grams
نویسندگان
چکیده
We present a simple q-gram based semi-index, which allows to look for a pattern typically only in a small fraction of text blocks. Several space-time tradeoffs are presented. Experiments on Pizza & Chili datasets show that our solution is up to three orders of magnitude faster than the Claude et al. [4] semi-index at a comparable space usage.
منابع مشابه
Automated Methods for Estimating Baseflow from Streamflow Records in a Semi Arid Watershed
Understanding of the runoff generation processes is important in understanding the magnitude and dynamics ofgroundwater discharge. However, these processes continue to be difficult to quantify and conceptualize. In this study,two digital filter based separation modules, the Recursive filtering method (RDF) and a generalization of therecursive digital filter (GRDF) were1991–2002 in the Hableh Ro...
متن کاملAnagram: A Content Anomaly Detector Resistant to Mimicry Attack
In this paper, we present Anagram, a content anomaly detector that models a mixture of high-order n-grams (n > 1) designed to detect anomalous and “suspicious” network packet payloads. By using higher-order n-grams, Anagram can detect significant anomalous byte sequences and generate robust signatures of validated malicious packet content. The Anagram content models are implemented using highly...
متن کاملPrivate record linkage with Bloom filters
In many record linkage applications, identifiers have to be encrypted to preserve privacy. Therefore, a method for approximate string comparison in private record linkage is needed. We describe a new method of approximate string comparison in private record linkage. The main idea is to store q-grams sets derived from identifier values in Bloom filters and compare them bitwise across databases. ...
متن کاملAnagram: A Content Anomaly Detector Resistant to Mimicry Attack1
In this paper, we present Anagram, a content anomaly detector that models a mixture of high-order n-grams (n > 1) designed to detect anomalous and “suspicious” network packet payloads. By using higher-order n-grams, Anagram can detect significant anomalous byte sequences and generate robust signatures of validated malicious packet content. The Anagram content models are implemented using highly...
متن کاملPrivacy-preserving record linkage using Bloom filters
BACKGROUND Combining multiple databases with disjunctive or additional information on the same person is occurring increasingly throughout research. If unique identification numbers for these individuals are not available, probabilistic record linkage is used for the identification of matching record pairs. In many applications, identifiers have to be encrypted due to privacy concerns. METHOD...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Softw., Pract. Exper.
دوره 47 شماره
صفحات -
تاریخ انتشار 2017