Optimal Packed String Matching
نویسندگان
چکیده
In the packed string matching problem, each machine word accommodates α characters, thus an n-character text occupies n/α memory words. We extend the Crochemore-Perrin constantspace O(n)-time string matching algorithm to run in optimal O(n/α) time and even in real-time, achieving a factor α speedup over traditional algorithms that examine each character individually. Our solution can be efficiently implemented, unlike prior theoretical packed string matching work. We adapt the standard RAM model and only use its AC0 instructions (i.e., no multiplication) plus two specialized AC0 packed string instructions. The main string-matching instruction is available in commodity processors (i.e., Intel’s SSE4.2 and AVX Advanced String Operations); the other maximal-suffix instruction is only required during pattern preprocessing. In the absence of these two specialized instructions, we propose theoretically-efficient emulation using integer multiplication (not AC0) and table lookup. 1998 ACM Subject Classification F.2 Analysis of Algorithms and Problem Complexity. F.2.2 Nonnumerical Algorithms and Problems—Pattern Matching.
منابع مشابه
Fast Searching in Packed Strings
Given strings P and Q the (exact) string matching problem is to find all positions of substrings in Q matching P . The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in ...
متن کاملTowards optimal packed string matching
In the packed string matching problem, it is assumed that each machine word can accommodate up to α characters, thus an n-character string occupies n/α memory words. (a) We extend the Crochemore-Perrin constant-spaceO(n)-time string matching algorithm to run in optimal O(n/α) time and even in real-time, achieving a factor α speedup over traditional algorithms that examine each character individ...
متن کاملFast Packed String Matching for Short Patterns
Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. In ...
متن کاملTighter Packed Bit-Parallel NFA for Approximate String Matching
We propose a new variant of the bit-parallel NFA of Baeza-Yates and Navarro (BPD) for approximate string matching [1]. Given a length-m pattern and an error threshold k, the original BPD uses (m−k)(k +2) bits of space. We decrease this to (m− k)(k +1), and also give a slightly more efficient simulation algorithm for the NFA. In experiments our modified NFA is often noticeably more efficient tha...
متن کاملAverage-optimal string matching
The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences. While this is seemingly more difficult than the original problem, we show that the idea leads...
متن کامل