Small-space encoding LCE data structure with constant-time queries
نویسندگان
چکیده
The longest common extension (LCE) problem is to preprocess a given string w of length n so that the length of the longest common prefix between suffixes of w that start at any two given positions is answered quickly. In this paper, we present a data structure of O(zτ + n τ ) words of space which answers LCE queries in O(1) time and can be built in O(n log σ) time, where 1 ≤ τ ≤ √ n is a parameter, z is the size of the Lempel-Ziv 77 factorization of w and σ is the alphabet size. This is an encoding data structure, i.e., it does not access the input string w when answering queries and thus w can be deleted after preprocessing. On top of this main result, we obtain further results using (variants of) our LCE data structure, which include the following: • For highly repetitive strings where the zτ term is dominated by n τ , we obtain a constant-time and sub-linear space LCE query data structure. • Even when the input string is not well compressible via Lempel-Ziv 77 factorization, we still can obtain a constant-time and sub-linear space LCE data structure for suitable τ and for σ ≤ 2. • The time-space trade-off lower bounds for the LCE problem by Bille et al. [J. Discrete Algorithms, 25:42-50, 2014] and by Kosolobov [CoRR, abs/1611.02891, 2016] can be “surpassed” in some cases with our LCE data structure.
منابع مشابه
Longest Common Extensions in Trees
The longest common extension (LCE) of two indices in a string is the length of the longest identical substrings starting at these two indices. The LCE problem asks to preprocess a string into a compact data structure that supports fast LCE queries. In this paper we generalize the LCE problem to trees and suggest a few applications of LCE in trees to tries and XML databases. Given a labeled and ...
متن کاملDeterministic Sub-Linear Space LCE Data Structures With Efficient Construction
Given a string S of n symbols, a longest common extension query LCE(i, j) asks for the length of the longest common prefix of the ith and jth suffixes of S. LCE queries have several important applications in string processing, perhaps most notably to suffix sorting. Recently, Bille et al. (J. Discrete Algorithms 25:42–50, 2014, Proc. CPM 2015:65–76) described several data structures for answeri...
متن کاملTime-Space Trade-Offs for Longest Common Extensions
We revisit the longest common extension (LCE) problem, that is, preprocess a string T into a compact data structure that supports fast LCE queries. An LCE query takes a pair (i, j) of indices in T and returns the length of the longest common prefix of the suffixes of T starting at positions i and j. We study the time-space trade-offs for the problem, that is, the space used for the data structu...
متن کاملFast Longest Common Extensions in Small Space
In this paper we address the longest common extension (LCE) problem: to compute the length l of the longest common prefix between any two suffixes of T ∈ Σ with Σ = {0, . . . σ − 1}. We present two fast and spaceefficient solutions based on (Karp-Rabin) fingerprinting and sampling. Our first data structure exploits properties of Mersenne prime numbers when used as moduli of the Karp-Rabin hash ...
متن کاملFully Dynamic Data Structure for LCE Queries in Compressed Space
A Longest Common Extension (LCE) query on a text T of length N asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G of size w = O(min(z log N log∗M, N)) [Mehlhorn et al., Algorithmica 17(2):183198, 1997] of T , which can be seen as a compressed representation of T , has a capability to support LCE queries in O(log N ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1702.07458 شماره
صفحات -
تاریخ انتشار 2017