Space-Efficient String Indexing for Wildcard Pattern Matching
نویسندگان
چکیده
In this paper we describe compressed indexes that support pattern matching queries for strings with wildcards. For a constant size alphabet our data structure uses O(n log n) bits for any ε > 0 and reports all occ occurrences of a wildcard string in O(m + σ · μ(n) + occ) time, where μ(n) = o(log log logn), σ is the alphabet size, m is the number of alphabet symbols and g is the number of wildcard symbols in the query string. We also present an O(n)bit index with O((m + σ + occ) log n) query time and an O(n(log logn)2)-bit index with O((m+ σ + occ) log logn) query time. These are the first non-trivial data structures for this problem that need o(n logn) bits of space.
منابع مشابه
Randomization in Parallel Stringology
In this abstract, we provide an overview of our survey of randomized techniques for exploiting the parallelism in string matching problems. Broadly, the study of string matching falls into two categories: standard stringology and nonstandard stringology. Standard Stringology concerns the study of various exact matching problems. The fundamental problem here is the basic string matching problem ...
متن کاملCross-Document Pattern Matching
We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a substring of another document. Several variants of this problem are considered, and efficient linear-space solutions are proposed with query time ...
متن کاملSuccincter Text Indexing with Wildcards
We study the problem of indexing text with wildcard positions, motivated by the challenge of aligning sequencing data to large genomes that contain millions of single nucleotide polymorphisms (SNPs)—positions known to differ between individuals. SNPs modeled as wildcards can lead to more informed and biologically relevant alignments. We improve the space complexity of previous approaches by giv...
متن کاملSWiM: Secure Wildcard Pattern Matching From OT Extension
Suppose a server holds a long text string and a receiver holds a short pattern string. Secure pattern matching allows the receiver to learn the locations in the long text where the pattern appears, while leaking nothing else to either party besides the length of their inputs. In this work we consider secure wildcard pattern matching (WPM), where the receiver’s pattern is allowed to contain wild...
متن کاملSimple deterministic wildcard matching
We present a simple and fast deterministic solution to the string matching with don’t cares problem. The task is to determine all positions in a text where a pattern occurs, allowing both pattern and text to contain single character wildcards. Our algorithm takes O(n logm) time for a text of length n and a pattern of length m and in our view the algorithm is conceptually simpler than previous a...
متن کامل