Hashing and Indexing: Succinct DataStructures and Smoothed Analysis
نویسندگان
چکیده
We consider the problem of indexing a text T (of length n) with a light data structure that supports efficient search of patterns P (of length m) allowing errors under the Hamming distance. We propose a hash-based strategy that employs two classes of hash functions—dubbed Hamming-aware and de Bruijn—to drastically reduce search space and memory footprint of the index, respectively. We use our succinct hash data structure to solve the k-mismatch search problem in 2n log σ+o(n log σ) bits of space with a randomized algorithm having smoothed complexity O((2σ)(logn)(logm+ ξ) + (occ+ 1) ·m), where σ is the alphabet size, occ is the number of occurrences, and ξ is a term depending on m, n, and on the amplitude of the noise perturbing text and pattern. Significantly, we obtain that for any > 0, for m large enough, ξ ∈ O(logm): our results improve upon previous linear-space solutions of the k-mismatch problem.
منابع مشابه
Indexing Algorithm Based on Improved Sparse Local Sensitive Hashing
In this article, we propose a new semantic hashing algorithm to address the new-merging problems such as the difficulty in similarity measurement brought by highdimensional data. Based on local sensitive hashing and spectral hashing, we introduce sparse principal component analysis (SPCA) to reduce the dimension of the data set which exclude the redundancy in the parameter list, and thus make h...
متن کاملComparison Of Modified Dual Ternary Indexing And Multi-Key Hashing Algorithms For Music Information Retrieval
In this work we have compared two indexing algorithms that have been used to index and retrieve Carnatic music songs. We have compared a modified algorithm of the Dual ternary indexing algorithm for music indexing and retrieval with the multi-key hashing indexing algorithm proposed by us. The modification in the dual ternary algorithm was essential to handle variable length query phrase and to ...
متن کاملLearning Succinct Models: Pipelined Compression with L1-Regularization, Hashing, Elias-Fano Indices, and Quantization
The recent proliferation of smart devices necessitates methods to learn small-sized models. This paper demonstrates that if there arem features in total but only n = o( √ m) features are required to distinguish examples, with Ω(logm) training examples and reasonable settings, it is possible to obtain a goodmodel in a succinct representation using n log2 m n+o(m) bits, by using a pipeline of exi...
متن کاملImage authentication using LBP-based perceptual image hashing
Feature extraction is a main step in all perceptual image hashing schemes in which robust features will led to better results in perceptual robustness. Simplicity, discriminative power, computational efficiency and robustness to illumination changes are counted as distinguished properties of Local Binary Pattern features. In this paper, we investigate the use of local binary patterns for percep...
متن کاملAn adaptive hashing technique for indexing moving objects
Although hashing techniques are widely used for indexing moving objects, they cannot handle the dynamic workload, e.g. the traffic at peak hour vs. that in the night. This paper proposes an adaptive hashing technique to support the dynamic workload efficiently. The proposed technique maintains two levels of the hashes, one for fast moving objects and the other for quasi-static objects. A moving...
متن کامل