On Boyer-Moore Preprocessing
نویسنده
چکیده
Probably the two best-known exact string matching algorithms are the linear-time algorithm of Knuth, Morris and Pratt (KMP), and the fast on average algorithm of Boyer and Moore (BM). The efficiency of these algorithms is based on using a suitable failure function. When a mismatch occurs in the currently inspected text position, the purpose of a failure function is to tell how many positions the pattern can be shifted forwards in the text without skipping over any occurrences. The BM algorithm uses two failure functions: one is based on a bad character rule, and the other on a good suffix rule. The classic linear-time preprocessing algorithm for the good suffix rule has been viewed as somewhat obscure [8]. A formal proof of the correctness of that algorithm was given recently by Stomp [14]. That proof is based on linear time temporal logic, and is fairly technical and a-posteriori in nature. In this paper we present a constructive and somewhat simpler discussion about the correctness of the classic preprocessing algorithm for the good suffix rule. We also highlight the close relationship between this preprocessing algorithm and the exact string matching algorithm of Morris and Pratt (a pre-version of KMP). For these reasons we believe that the present paper gives a better understanding of the ideas behind the preprocessing algorithm than the proof by Stomp. This paper is based on [9], and thus the discussion is originally roughly as old as the proof by Stomp.
منابع مشابه
A Mechanically Checked Proof of the Correctness of the Boyer-Moore Fast String Searching Algorithm
We describe a mechanically checked proof that the Boyer-Moore fast string searching algorithm is correct. This is done by expressing both the fast algorithm and the naïve (obviously correct) algorithm as functions in applicative Common Lisp and proving them equivalent with the ACL2 theorem prover. The algorithm verified differs from the original Boyer-Moore algorithm in one key way: the origina...
متن کاملThe Boyer-Moore-Galil String Searching Strategies Revisited
Based on the Boyer-Moore-Galil approach. a new algorithm is proposed which requires a number of character comparisons bounded by 20, regardless of the Dumber of occurrences of the pattern in the textstring. Preprocessing is only slightly more involved and still requires a time linear in the pattern size.
متن کاملA unifying look at the Apostolico-Giancarlo string-matching algorithm
String matching is the problem of finding all the occurrences of a pattern in a text. We present a new method to compute the combinatorial shift function (“matching shift”) of the well-known Boyer–Moore string matching algorithm. This method implies the computation of the length of the longest suffixes of the pattern ending at each position in this pattern. These values constituted an extra-pre...
متن کاملW-Period Technique for Parallel String Matching
In this paper, we present new approach for parallel string matching. Some known parallel string matching algorithms are considered based on duels by witness which focuses on the strengths and weaknesses of the currently known methods. This has applications such as string databases, Information Retrieval and computational biology. The new ‘divide and conquer’ approach has been introduced for par...
متن کاملBoyer-Moore Strategy to Efficient Approximate String Matching
We propose a simple but eecient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet 6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State re...
متن کامل