On-Line Approximate String Matching with Bounded Errors

نویسندگان

Marcos A. Kiwi

Gonzalo Navarro

Claudio Telha

چکیده

We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter so that the algorithm is allowed to miss occurrences with probability . This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the problem allows us breaking the average-case optimal lower bound of the classical problem, achieving average case O(n logσm/m) time with any = poly(k/m), where n is the text size, m the pattern length, k the number of differences for edit distance, and σ the alphabet size. Our experimental results show the practicality of this novel and promising research direction. Finally, we extend the proposed approach to the multiple approximate string matching setting, where the approximate occurrence of r patterns are simultaneously sought. Again, we can break the average-case optimal lower bound of the classical problem, achieving average case O(n logσ(rm)/m) time with any = poly(k/m).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

Self-Bounded Prediction Suffix Tree via Approximate String Matching

Prediction suffix trees (PST) provide an effective tool for sequence modelling and prediction. Current prediction techniques for PSTs rely on exact matching between the suffix of the current sequence and the previously observed sequence. We present a provably correct algorithm for learning a PST with approximate suffix matching by relaxing the exact matching condition. We then present a self-bo...

متن کامل

Fast Approximate String Matching in a Dictionary

A successful technique to search large textual databases allowing errors relies on an online search in the vocabulary of the text. To reduce the time of that on-line search, we index the vocabulary as a metric space. We show that with reasonable space overhead we can improve by a factor of two over the fastest online algorithms , when the tolerated error level is low (which is reasonable in tex...

متن کامل

Approximate String Matching: Theory and Applications (La Recherche Approchée de Motifs : Théorie et Applications)

The approximate string matching is a fundamental and recurrent problem that arises in most computer science fields. This problem can be defined as follows : Let D = {x1, x2, . . . xd} be a set of d words defined on an alphabet Σ, let q be a query defined also on Σ, and let k be a positive integer. We want to build a data structure on D capable of answering the following query : find all words i...

متن کامل

n-Gram/2L-approximation: a two-level n-gram inverted index structure for approximate string matching

Approximate string matching is to find all the occurrences of a query string in a text database allowing a specified number of errors. Approximate string matching based on the n-gram inverted index (simply, n-gram Matching) has been widely used. A major reason is that it is scalable for large databases since it is not a main memory algorithm. Nevertheless, n-gram Matching also has drawbacks: th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Theor. Comput. Sci.

دوره 412 شماره

صفحات -

تاریخ انتشار 2008

On-Line Approximate String Matching with Bounded Errors

نویسندگان

چکیده

منابع مشابه

Adaptive Approximate Record Matching

Self-Bounded Prediction Suffix Tree via Approximate String Matching

Fast Approximate String Matching in a Dictionary

Approximate String Matching: Theory and Applications (La Recherche Approchée de Motifs : Théorie et Applications)

n-Gram/2L-approximation: a two-level n-gram inverted index structure for approximate string matching

عنوان ژورنال:

اشتراک گذاری