A Mechanically Checked Proof of the Correctness of the Boyer-Moore Fast String Searching Algorithm
نویسندگان
چکیده
We describe a mechanically checked proof that the Boyer-Moore fast string searching algorithm is correct. This is done by expressing both the fast algorithm and the naïve (obviously correct) algorithm as functions in applicative Common Lisp and proving them equivalent with the ACL2 theorem prover. The algorithm verified differs from the original Boyer-Moore algorithm in one key way: the original algorithm preprocessed the pattern into two arrays and skipped forward by the maximum of the skip distances recorded in those arrays; the algorithm verified uses one array that combines the two original arrays (and awhose size is the product of that of the original arrays). The algorithm here skips at least as far as the original Boyer-Moore algorithm and often skips further, though we do not prove that mechanically. A key fact about the original algorithm is that preprocessing can be done in time linear in the length of the pattern, |pat|, and the size of the alphabet, |α|. Our implementation of the preprocessing here is unconcerned with efficiency and has complexity |α| × |pat|. Our mechanically checked proof includes a proof that our preprocessing is correct. We briefly describe a proof (shown in detail elsewhere) that an imperatively coded version of the fast algorithm implements the algorithm verified here.
منابع مشابه
On Boyer-Moore Preprocessing
Probably the two best-known exact string matching algorithms are the linear-time algorithm of Knuth, Morris and Pratt (KMP), and the fast on average algorithm of Boyer and Moore (BM). The efficiency of these algorithms is based on using a suitable failure function. When a mismatch occurs in the currently inspected text position, the purpose of a failure function is to tell how many positions th...
متن کاملString Matching in the DNA Alphabet
Searching for occurrences of string patterns is a common problem in many applications. Various good solutions have been presented for string matching. The most efficient solutions in practice are based on the Boyer–Moore algorithm.1 A typical question in molecular biology is whether a given sequence has appeared elsewhere. In the following, we will concentrate on searching for exact occurrences...
متن کاملFast String Searching
Since the Boyer-Moore algorithm was described in 1977, it has been the standard benchmark for the practical string search literature. Yet this yardstick compares badly with current practice. We describe two algorithms that perform 47% fewer comparisons and are about 4.5 times faster across a wide range of architectures and compilers. These new variants are members of a family of algorithms base...
متن کاملMechanized Operational Semantics: The M1 Story
In this paper we explain how to formalize an “operational” or “state-transition” semantics of a von Neumann programming language in a functional programming language. By adopting an “interpretive” style, one can execute the model in the functional language to “run” programs in the von Neumann language. Given the ability to reason about the functional language, one can use the model to reason ab...
متن کاملBoyer - Moore String Matching over Ziv -
We present a Boyer-Moore approach to string matching over LZ78 and LZW compressed text. The key idea is that, despite that we cannot exactly choose which text characters to inspect, we can still use the characters explicitly represented in those formats to shift the pattern in the text. We present a basic approach and more advanced ones. Despite that the theoretical average complexity does not ...
متن کامل