Lempel-Ziv Compression in a Sliding Window
نویسندگان
چکیده
We present new algorithms for the sliding window Lempel-Ziv (LZ77) problem and the approximate rightmost LZ77 parsing problem. Our main result is a new and surprisingly simple algorithm that computes the sliding window LZ77 parse in O(w) space and either O(n) expected time or O(n log logw + z log log σ) deterministic time. Here, w is the window size, n is the size of the input string, z is the number of phrases in the parse, and σ is the size of the alphabet. This matches the space and time bounds of previous results while removing constant size restrictions on the alphabet size. To achieve our result, we combine a simple modification and augmentation of the suffix tree with periodicity properties of sliding windows. We also apply this new technique to obtain an algorithm for the approximate rightmost LZ77 problem that uses O(n(log z+log logn)) time and O(n) space and produces a (1 + )-approximation of the rightmost parsing (any constant > 0). While this does not improve the best known time-space trade-offs for exact rightmost parsing, our algorithm is significantly simpler and exposes a direct connection between sliding window parsing and the approximate rightmost matching problem. 1998 ACM Subject Classification E.4 Coding and Information Theory, E.1 Data Structures, F.2.2 Nonnumerical Algorithms and Problems
منابع مشابه
LZW Data Compression on Large Scale and Extreme Distributed Systems
Results on the parallel complexity of Lempel-Ziv data compression suggest that the sliding window method is more suitable than the LZW technique on shared memory parallel machines. When instead we address the practical goal of designing distributed algorithms with low communication cost, sliding window compression does not seem to guarantee robustness if we scale up the system. The possibility ...
متن کاملOn Match Lengths, Zero Entropy and Large Deviations - with Application to Sliding Window Lempel-Ziv Algorithm
The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence times and match lengths has been studied from various perspectives in information theory literature. In this paper, we undertake a finer study of these quantities under two different scenarios, i) zero entropy sources that are characterized by strong long-term memory, and ii) the processes with weak memory as described ...
متن کاملA universal scheme for Wyner-Ziv coding of discrete sources
We consider the Wyner–Ziv (WZ) problem of lossy compression where the decompressor observes a noisy version of the source, whose statistics are unknown. A new family of WZ coding algorithms is proposed and their universal optimality is proven. Compression consists of sliding-window processing followed by Lempel–Ziv (LZ) compression, while the decompressor is based on a modification of the discr...
متن کاملMost Recent Match Queries in On-Line Suffix Trees
A suffix tree is able to efficiently locate a pattern in an indexed string, but not in general the most recent copy of the pattern in an online stream, which is desirable in some applications. We study the most general version of the problem of locating a most recent match: supporting queries for arbitrary patterns, at each step of processing an online stream. We present augmentations to Ukkone...
متن کاملImage Compression via Textual Substitution
Textual substitution methods, often called dictionary methods or Lempel-Ziv methods, after the important work of Lempel and Ziv, are one-dimensional compression methods that maintain a constantly changing dictionary of strings to adaptively compress a stream of characters by replacing common substrings with indices (pointers) into a dictionary. Lempel and Ziv proved that the proposed schemes we...
متن کامل