Compressed Suux Arrays and Suux Trees with Applications to Text Indexing and String Matching
نویسنده
چکیده
The proliferation of online text, such as on the World Wide Web and in databases, motivates the need for space-eecient index methods that support fast search. Consider a text T of n binary symbols to index. Given any query pattern P of m binary symbols, the goal is to search for P in T quickly, with T being fully scanned only once, namely, when the index is created. All indexing schemes published in the last thirty years support searching in (m) worst-case time and require (n) memory words (or (n log n) bits), which is signiicantly larger than the text itself. In this paper we provide a breakthrough both in searching time and index space under the same model of computation as the one adopted in previous work. Based upon new compressed representations of suux arrays and suux trees, we construct an index structure that occupies only O(n) bits and compares favorably with inverted lists in space. We can search any binary pattern P, stored in O(m= log n) words, in only o(m) time. Speciically, searching takes O(1) time for m = o(log n), and O(m= log n + log n) = o(m) time for m = (log n) and any xed 0 < < 1. That is, we achieve optimal O(m= log n) search time for suuciently large m = (log 1+ n). We can list all the occ pattern occurrences in optimal O(occ) additional time when m = (polylog(n)) or when occ = (n); otherwise, listing takes O(occ log n) additional time.
منابع مشابه
Indexing Compressed Text
We present a technique to build an index based on suux arrays for compressed texts. We also propose a compression scheme for textual databases based on words that generates a compression code that preserves the lexicographical ordering of the text words. As a consequence it permits the sorting of the compressed strings to generate the suux array without decompressing. As the compressed text is ...
متن کاملSuux Binary Search Trees and Suux Arrays
Suux arrays and suux binary search trees are two data structures that have been proposed as alternatives to the classical suux tree to facilitate eecient on-line string searching. Here, we explore the relationship between these two structures. In particular, we present an alternative view of a suux array, with its auxiliary information, as a perfectly balanced suux binary search tree, and descr...
متن کاملSpace Eecient Suux Trees
We give the rst representation of a suux tree that uses n lg n + O(n) bits of space and supports searching for a pattern string in the given text (from a xed size alphabet) in O(m) time, where n is the size of the text and m is the length of the pattern. The structure is quite simple and answers a question raised by Muthukrishnan in 22]. Previous compact representations of suux trees had either...
متن کاملTrade Oo between Compression and Search times in Compact Suux Array ?
Suux array is a widely used full-text index that allows fast searches on the text. It is constructed by sorting all suuxes of the text in the lexicographic order and storing pointers to the suuxes in this order. Binary search is used for fast searches on the suux array. Compact suux array is a compressed form of the suux array that still allows binary searches, but the search times are also dep...
متن کاملAn experimental study of SB-trees
In a previous work of ours 13], we proposed a text indexing data structure for external memory, which we called SB-tree, that combines the best B-tree and suux array qualities to overcome the limitations of inverted les, suux arrays, suux trees, and preex B-trees. In this paper, we study the performance of SB-trees in a practical setting by running a large number of searching and updating exper...
متن کامل