Improvements to the Burrows-Wheeler Compression Algorithm: After BWT Stages
نویسنده
چکیده
The lossless Burrows-Wheeler Compression Algorithm has received considerable attention over recent years for both its simplicity and effectiveness. It is based on a permutation of the input sequence − the Burrows-Wheeler Transform − which groups symbols with a similar context close together. In the original version, this permutation was followed by a Move-To-Front transformation and a final entropy coding stage. Later versions used different algorithms which come after the Burrows-Wheeler Transform, since the stages after the BurrowsWheeler Transform have a significant influence on the compression rate. This article describes improved algorithms for the run length encoding, inversion frequencies and weighted frequency count stages that follow the Burrows-Wheeler Transform. Results for compression rates are presented for different variations of the algorithm together with compression and decompression times. Finally, an implementation with a compression rate of 2.238 bps on the Calgary Corpus is introduced, which is the best result published in this field to date.
منابع مشابه
Incremental frequency count - a post BWT-stage for the Burrows-Wheeler compression algorithm
The stage after the Burrows-Wheeler Transform (BWT) has a key function inside the Burrows-Wheeler compression algorithm as it transforms the BWT output from a local context into a global context. This paper presents the Incremental Frequency Count stage, a post-BWT stage. The new stage is paired with a run length encoding stage between the BWT and entropy coding stage of the algorithm. It offer...
متن کاملOne attempt of a compression algorithm using the BWT
In 1994 Burrows and Wheeler [5] described a universal data compression algorithm (BW-algorithm, for short) which achieved compression rates that were close to the best known compression rates. Due to it’s simplicity, the algorithm can be implemented with relatively low complexity. Fenwick [8] described ideas to improve the efficiency (i.e. the compression rate) and complexity of the BW-algorith...
متن کاملExperimental Evaluation of List Update Algorithms for Data Compression
List update algorithms have been widely used as subroutines in compression schemas, most notably as part of Burrows-Wheeler compression. The Burrows-Wheeler transform (BWT), which is the basis of many state-of-the-art general purpose compressors applies a compression algorithm to a permuted version of the original text. List update algorithms are a common choice for this second stage of BWT-bas...
متن کاملHigh-performance BWT-based Encoders
In 1994, Burrows and Wheeler [5] developed a data compression algorithm which performs significantly better than Lempel-Ziv based algorithms. Since then, a lot of work has been done in order to improve their algorithm, which is based on a reversible transformation of the input string, called BWT (the Burrows-Wheeler transformation). In this paper, we propose a compression scheme based on BWT, M...
متن کاملAn Application of Self-organizing Data Structures to Compression
List update algorithms have been widely used as subroutines in compression schemas, most notably as part of Burrows-Wheeler compression. The Burrows-Wheeler transform (BWT), which is the basis of many state-of-the-art general purpose compressors applies a compression algorithm to a permuted version of the original text. List update algorithms are a common choice for this second stage of BWT-bas...
متن کامل