Average-optimal string matching

نویسندگان

  • Kimmo Fredriksson
  • Szymon Grabowski
چکیده

The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences. While this is seemingly more difficult than the original problem, we show that the idea leads to very simple algorithms that are optimal on average. We then show how our basic method can be used to solve multiple string matching as well as several approximate matching problems in average optimal time. The general method can be applied to many existing string matching algorithms. Our experimental results show that the algorithms perform very well in practice.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Average-Optimal Multiple Approximate String Matching

We present a new algorithm for multiple approximate string matching, based on an extension of the optimal (on average) singlepattern approximate string matching algorithm of Chang and Marr. Our algorithm inherits the optimality and is also competitive in practice. We present a second algorithm that is linear time and handles higher difference ratios. We show experimentally that our algorithms a...

متن کامل

Average-Case Optimal Approximate Circular String Matching

Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that are at a distance at most k from x or from any of its rotations. In this article, we present a new algorithm for approximate circular string matching under the...

متن کامل

Improved Single and Multiple Approximate String Matching

We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough `-grams from text windows so as to prove that no occurrence can contain the part of the window read, and then shifting the window. Three variants of the algorithm are presented, which give different tradeoffs between how much they work in the window and how much they shift it. We show an...

متن کامل

On-Line Approximate String Matching with Bounded Errors

We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter so that the algorithm is allowed to miss occurrences with probability . This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the pr...

متن کامل

Simple Compression Code Supporting Random Access and Fast String Matching

Given a sequence S of n symbols over some alphabet Σ, we develop a new compression method that is (i) very simple to implement; (ii) provides O(1) time random access to any symbol of the original sequence; (iii) allows efficient pattern matching over the compressed sequence. Our simplest solution uses at most 2h + o(h) bits of space, where h = n(H0(S) + 1), and H0(S) is the zeroth-order empiric...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Discrete Algorithms

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2009