Fast pattern-matching on indeterminate strings
نویسندگان
چکیده
In a string x on an alphabet Σ, a position i is said to be indeterminate iff x[i] may be any one of a specified subset {λ1, λ2, . . . , λj} of Σ, 2 ≤ j ≤ |Σ|. A string x containing indeterminate positions is therefore also said to be indeterminate. Indeterminate strings can arise in DNA and amino acid sequences as well as in cryptological applications and the analysis of musical texts. In this paper we describe fast algorithms for finding all occurrences of a pattern p = p[1..m] in a given text x = x[1..n], where either or both of p and x can be indeterminate. Our algorithms are based on the Sunday variant of the Boyer-Moore pattern-matching algorithm, one of the fastest exact pattern-matching algorithms known. The methodology we describe applies more generally to all variants of Boyer-Moore (such as Horspool’s, for example) that depend only on calculation of the δ (“rightmost shift”) array: our method therefore assumes that Σ is indexed (essentially, an integer alphabet), a requirement normally satisfied in practice.
منابع مشابه
An Adaptive Hybrid Pattern-Matching Algorithm on Indeterminate Strings
We describe a hybrid pattern-matching algorithm that works on both regular and indeterminate strings. This algorithm is inspired by the recently proposed hybrid algorithm FJS [11] and its indeterminate successor [15]. However, as discussed in this paper, because of the special properties of indeterminate strings, it is not straightforward to directly migrate FJS to an indeterminate version. Our...
متن کاملTruly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties
Strings with don’t care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still...
متن کاملEfficient pattern matching in degenerate strings with the Burrows-Wheeler transform
A degenerate or indeterminate string on an alphabet Σ is a sequence of non-empty subsets of Σ. Given a degenerate string t of length n, we present a new method based on the Burrows–Wheeler transform for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet Σ. Furthermore, it is a hybrid patternmatching technique that works on both regular and dege...
متن کاملFlexible and Efficient Algorithms for Abelian Matching in Strings
The abelian pattern matching problem consists in finding all substrings of a text which are permutations of a given pattern. This problem finds application in many areas and can be solved in linear time by a näıve sliding window approach. In this short communication we present a new class of algorithms based on a new efficient fingerprint computation approach, called Heap-Counting, which turns ...
متن کاملConservative String Covering of Indeterminate Strings
We study the problem of finding local and global covers as well as seeds in conservative indeterminate strings. An indeterminate string is a sequence T = T [1]T [2] . . . T [n], where T [i] ⊆ Σ for each i, and Σ is a given alphabet of fixed size. A conservative indeterminate string, is an indeterminate string where the number of indeterminate symbols in the positions of the string, i.e the non-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Discrete Algorithms
دوره 6 شماره
صفحات -
تاریخ انتشار 2008