A Simpler Analysis of Burrows-Wheeler Based Compression
نویسندگان
چکیده
In this paper we present a new technique for worst-case analysis of compression algorithms which are based on the Burrows-Wheeler Transform. We deal mainly with the algorithm proposed by Burrows and Wheeler in their first paper on the subject [6], called bw0. This algorithm consists of the following three essential steps: 1) Obtain the Burrows-Wheeler Transform of the text, 2) Convert the transform into a sequence of integers using the move-to-front algorithm, 3) Encode the integers using Arithmetic code or any order-0 encoding (possibly with run-length encoding). We achieve a strong upper bound on the worst-case compression ratio of this algorithm. This bound is significantly better than bounds known to date and is obtained via simple analytical techniques. Specifically, we show that for any input string s, and μ > 1, the length of the compressed string is bounded by μ · |s|Hk(s)+ log(ζ(μ)) · |s|+μgk +O(log n) where Hk is the k-th order empirical entropy, gk is a constant depending only on k and on the size of the alphabet, and ζ(μ) = 1 1μ+ 1 2μ+. . . is the standard zeta function. As part of the analysis we prove a result on the compressibility of integer sequences, which is of independent interest. Finally, we apply our techniques to prove a worst-case bound on the compression ratio of a compression algorithm based on the Burrows-Wheeler Transform followed by distance coding, for which worst-case guarantees have never been given. We prove that the length of the compressed string is bounded by 1.7286 · |s|Hk(s) + gk +O(log n). This bound is better than the bound we give for bw0. Preprint submitted to Elsevier Science 27 October 2006
منابع مشابه
Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice
ÐA very interesting recent development in data compression is the Burrows-Wheeler Transformation [1]. The idea is to permute the input sequence in such a way that characters with a similar context are grouped together. We provide a thorough analysis of the Burrows-Wheeler Transformation from an information theoretic point of view. Based on this analysis, the main part of the paper systematicall...
متن کاملBurrows Wheeler Based Data Compression and Secure Transmission
Now days, computer technology mostly focusing on storage space and speed With the rapid growing of important data and increased number of applications, devising new approach for efficient compression and encryption methods are playing a vital role in performance. In this work, burrows wheeler transformation is introduced for pre processing of the input data and made several performance analysis...
متن کاملBurrows-Wheeler based JPEG
Recently, the use of the Burrows-Wheeler method for data compression has been expanded. A method of enhancing the compression efficiency of the common JPEG standard is presented in this paper, exploiting the Burrows-Wheeler compression technique. The paper suggests a replacement of the traditional Huffman compression used by JPEG by the Burrows-Wheeler compression. When using high quality image...
متن کاملImprovements to the Burrows-Wheeler Compression Algorithm: After BWT Stages
The lossless Burrows-Wheeler Compression Algorithm has received considerable attention over recent years for both its simplicity and effectiveness. It is based on a permutation of the input sequence − the Burrows-Wheeler Transform − which groups symbols with a similar context close together. In the original version, this permutation was followed by a Move-To-Front transformation and a final ent...
متن کاملRadixZip: Linear-Time Compression of Token Streams
RadixZip is a block compression technique for token streams. It introduces RadixZip Transform, a linear time algorithm that rearranges bytes using a technique inspired by radix sorting. For appropriate data, RadixZip Transform is analogous to the Burrows-Wheeler Transform used in bzip2, but is both simpler in operation and more effective in compression. In addition, RadixZip Transform can take ...
متن کامل