A Separation Between Run-Length SLPs and LZ77

نویسندگان

  • Philip Bille
  • Travis Gagie
  • Inge Li Gørtz
  • Nicola Prezza
چکیده

In this paper we give an infinite family of strings for which the length of the Lempel-Ziv’77 parse is a factor Ω(log n/ log log n) smaller than the smallest run-length grammar.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distance Measures for Sequences

Given a set of sequences, the distance between pairs of them helps us to find their similarity and derive structural relationship amongst them. For genomic sequences such measures make it possible to construct the evolution tree of organisms. In this paper we compare several distance measures and examine a method that involves circular shifting one sequence against the other for finding good al...

متن کامل

Fingerprints in Compressed Strings

The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i, j]...

متن کامل

From LZ77 to the Run-Length Encoded Burrows-Wheeler Transform, and Back

The Lempel-Ziv factorization (LZ77) and the Run-Length encoded BurrowsWheeler Transform (RLBWT) are two important tools in text compression and indexing, being their sizes z and r closely related to the amount of text self-repetitiveness. In this paper we consider the problem of converting the two representations into each other within a working space proportional to the input and the output. L...

متن کامل

Composite Repetition-Aware Data Structures

In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time ...

متن کامل

Restructuring Compressed Texts without Explicit Decompression

We consider the problem of restructuring compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string T produced by any grammar-based compression algorithm, to representations produced by several specific compression algorithms including LZ77, LZ78, run length encoding, and some grammar based compression algorithms. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.07270  شماره 

صفحات  -

تاریخ انتشار 2017