Approximate String Matching over Ziv - LempelCompressed

نویسندگان

  • Gonzalo Navarro
  • Esko Ukkonen
چکیده

We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, speciically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions, in O(mkn + R) time. The existence problem needs O(mkn) time. We also show that the algorithm can be adapted to run in O(k 2 n + min(mkn; m 2 (mm) k) + R) average time, where is the alphabet size. The experimental results show a speedup over the basic approach for moderate m and small k.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text

We address in this paper the problem of string matching on Lempel-Ziv compressed text. The goal is to search a pattern in a text without uncompressing. This is a highly relevant issue, since it is essential to have compressed text databases where eecient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts th...

متن کامل

Approximate String Matching over Ziv

We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, speciically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions, in O(mkn + R) ti...

متن کامل

A Lossy Data Compression Based on String Matching: Preliminary Analysis and Suboptimal Algorithms

A practical suboptimal algorithm (source coding) for lossy (non-faithful) data compression is discussed. This scheme is based on an approximate string matching, and it naturally extends lossless (faithful) Lempel-Ziv data compression scheme. The construction of the algorithm is based on a careful probabilistic analysis of an approximate string matching problem that is of its own interest. This ...

متن کامل

Approximate String Matching with Lempel-Ziv Compressed Indexes

A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T . Furthermore, the structure can reproduce any substring of T , thus it actually replaces T . Despite the explosion of interest on self-indexes in recent years, there has not been much progress on search functionalities beyond the basic exact search. In this paper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008