Indexing Variable Length Substrings for Exact and Approximate Matching
نویسندگان
چکیده
We introduce two new index structures based on the q-gram index. The new structures index substrings of variable length instead of q-grams of fixed length. For both of the new indexes, we present a method based on the suffix tree to efficiently choose the indexed substrings so that each of them occurs almost equally frequently in the text. Our experiments show that the resulting indexes are up to 40% faster than the q-gram index when they use the same space.
منابع مشابه
Abelian pattern matching in strings
Abelian pattern matching is a new class of pattern matching problems. In abelian patterns, the order of the characters in the substrings does not matter, e.g. the strings abbc and babc represent the same abelian pattern a+2b+c. Therefore, unlike classical pattern matching, we do not look for an exact (ordered) occurrence of a substring, rather the aim here is to find any permutation of a given ...
متن کاملEecient Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm
A key approach in string processing algorithmics has been the labeling paradigm KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the rst optimal parallel algorithm for suux tree construction was given in SV94], the labeling paradigm was considered not to be compe...
متن کاملApproximate String Matching with Variable Length Don ' t Care
Searching for DNA or amino acid sequences similar to a given pattern string is very important in molecular biology. In fact, a lot of programs and algorithms have been developed. Most of them are based on alignment of strings or approximate string matching. However, they do not seem to be adequate in some cases. For example, the DNA pattern TATA (known as TATA box) is a common promoter that oft...
متن کاملFiltration Algorithms for Approximate Order-Preserving Matching
The exact order-preserving matching problem is to find all the substrings of a text T which have the same length and relative order as a pattern P . Like string maching, order-preserving matching can be generalized by allowing the match to be approximate. In approximate order-preserving matching two strings match if they have the same relative order after removing up to k elements in the same p...
متن کاملEfficient Algorithm for δ-Approximate Jumbled Pattern Matching
The Jumbled Pattern Matching problem consists on finding substrings which can be permuted to be equal to a given pattern. Similarly the δ Approximate Jumbled Pattern Matching problem asks for substrings equivalent to a permutation of the given pattern, but allowing a vector of possible errors δ. Here we provide a new efficient solution for the δ Approximate Jumbled Pattern Matching problem usin...
متن کامل