Approximate String Matching
نویسنده
چکیده
We present a new indexing method for the approximate string matching problem. The method is based on a suux tree combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the retrieval time is O(n), for 0 < < 1, whenever < 1 ? e= p , where is the error level tolerated and is the alphabet size. We experimentally show that this index outperforms by far all other algorithms for indexed approximate searching, also being the rst experiments that compare the diierent existing schemes. We nally show how this index can be implemented using much less space.
منابع مشابه
Data structures and algorithms for approximate string matching
This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms. Special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximate string matching.
متن کاملSimulation of NFA in Approximate String and Sequence Matching
We present detailed description of simulation of nondeterministic nite automata (NFA) for approximate string matching. This simulation uses bit parallelism and used algorithm is called Shift-Or algorithm. Using knowledge of simulation of NFA by Shift-Or algorithm we design modi cation of ShiftOr algorithm for approximate string matching using generalized Levenshtein distance and modi cation for...
متن کاملSpace Complexity of Linear Time Approximate String Matching
Approximate string matching is a sequential problem and therefore it is possible to solve it using nite automata. Nondeterministic nite automata are constructed for string matching with k mismatches and k di erences. The corresponding deterministic nite automata are base for approximate string matching in linear time. Then the space complexity of both types of deterministic automata is calculat...
متن کاملApproximate Regular Expression Matching
We extend the de nition of Hamming and Levenshtein distance between two strings used in approximate string matching so that these two distances can be used also in approximate regular expression matching. Next, the methods of construction of nondeterministic nite automata for approximate regular expression matching considering both mentioned distances are presented.
متن کاملApproximate String Matching with Variable Length Don ' t Care
Searching for DNA or amino acid sequences similar to a given pattern string is very important in molecular biology. In fact, a lot of programs and algorithms have been developed. Most of them are based on alignment of strings or approximate string matching. However, they do not seem to be adequate in some cases. For example, the DNA pattern TATA (known as TATA box) is a common promoter that oft...
متن کاملApproximate String Matching with Don't Care Characters
This paper presents an O( p kmnpolylog(m)) time algorithm for approximate string matching (k-di erences problem), in which don't care characters may appear both in a pattern string and in a text string.
متن کامل