Average Profile of the Lempel-Ziv Parsing Scheme for Markovian Source
نویسندگان
چکیده
Jing Tang Microsoft Corporation One Microsoft Way, 1/2061 Redmond, WA 98052 U.S.A. [email protected] For a Markovian source, we analyze the Lempel-Ziv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, and the ith phrase is the shortest prefix of the ith sequence that was not seen before as a phrase (i.c., a prefix of previous (i 1) sequences). In the other two models, only a single sequence is generated by a Markovian source. In the second model, for which we coin the name Gilbert-Kadota model, a fixed number of phrases is generated according to the Lempel-Ziv algorithm, thus producing a sequence of a variable (random) length. In the last model, known also as the Lempel-Ziv model, a string of fixed length is partitioned into a variable (random) number of phrases. These three models can be efficiently represented and analyzed by digital search trees that are of interest to other algorithms such as sorting, searching and pattern matching. In this paper, we concentrate on analyzing the average profile (i.e., the average number of phrases of a given length), the typical phrase length, and the length of the last phrase. We obtain asymptotic expansions for the mean and the variance of the phrase length, and we prove that appropria.tely normalized phrase length in all three models tends to the standard normal distribution which lead to bounds on the average redundancy of the Lempel-Ziv code. For Markov Independent model, this finding is established by analytic methods (i.e., generating functions, Mellin transform and depoissonization), while for the other two models we use a combination of analytic and probabilistic analyses.
منابع مشابه
Average Profile of the Lempel - Ziv Parsing Scheme for Amarkovian
For a Markovian source, we analyze the Lempel-Ziv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, and the ith phrase is the shortest preex of the ith sequence that was not seen before as a phrase ...
متن کاملUniversal coding of nonstationary sources
In this correspondence we investigate the performance of the Lempel–Ziv incremental parsing scheme on nonstationary sources. We show that it achieves the best rate achievable by a finite-state block coder for the nonstationary source. We also show a similar result for a lossy coding scheme given by Yang and Kieffer which uses a Lempel–Ziv scheme to perform lossy coding.
متن کاملOn Generalized Digital Search Trees with Applicationsto a Generalized Lempel - Ziv
The goal of this research is twofold: (i) to analyze generalized digital search trees, and (ii) to derive the average proole (i.e., phrase length) of a generalization of the well known parsing algorithm due to Lempel and Ziv. In the generalized Lempel-Ziv parsing scheme, one partitions a sequence of symbols from a nite alphabet into phrases such that the new phrase is the longest substring seen...
متن کاملBit-Optimal Lempel-Ziv compression
One of the most famous and investigated lossless data-compression scheme is the one introduced by Lempel and Ziv about 40 years ago [23]. This compression scheme is known as ”dictionary-based compression” and consists of squeezing an input string by replacing some of its substrings with (shorter) codewords which are actually pointers to a dictionary of phrases built as the string is processed. ...
متن کاملAverage profile and limiting distribution for a phrase size in the Lempel-Ziv parsing algorithm
Wojciech Szpankowskl* Department of Computer Science Purdue University W. Lafayette, IN 47907 U.S.A. Consider the parsing algorithm due to Lempel and Ziv that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice the following parameters are of interest: number of phrases, the size of a phra...
متن کامل