Improvements to the Burrows-Wheeler Compression Algorithm: After BWT Stages

نویسنده

  • JUERGEN ABEL
چکیده

The lossless Burrows-Wheeler Compression Algorithm has received considerable attention over recent years for both its simplicity and effectiveness. It is based on a permutation of the input sequence − the Burrows-Wheeler Transform − which groups symbols with a similar context close together. In the original version, this permutation was followed by a Move-To-Front transformation and a final entropy coding stage. Later versions used different algorithms which come after the Burrows-Wheeler Transform, since the stages after the BurrowsWheeler Transform have a significant influence on the compression rate. This article describes improved algorithms for the run length encoding, inversion frequencies and weighted frequency count stages that follow the Burrows-Wheeler Transform. Results for compression rates are presented for different variations of the algorithm together with compression and decompression times. Finally, an implementation with a compression rate of 2.238 bps on the Calgary Corpus is introduced, which is the best result published in this field to date.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental frequency count - a post BWT-stage for the Burrows-Wheeler compression algorithm

The stage after the Burrows-Wheeler Transform (BWT) has a key function inside the Burrows-Wheeler compression algorithm as it transforms the BWT output from a local context into a global context. This paper presents the Incremental Frequency Count stage, a post-BWT stage. The new stage is paired with a run length encoding stage between the BWT and entropy coding stage of the algorithm. It offer...

متن کامل

One attempt of a compression algorithm using the BWT

In 1994 Burrows and Wheeler [5] described a universal data compression algorithm (BW-algorithm, for short) which achieved compression rates that were close to the best known compression rates. Due to it’s simplicity, the algorithm can be implemented with relatively low complexity. Fenwick [8] described ideas to improve the efficiency (i.e. the compression rate) and complexity of the BW-algorith...

متن کامل

Experimental Evaluation of List Update Algorithms for Data Compression

List update algorithms have been widely used as subroutines in compression schemas, most notably as part of Burrows-Wheeler compression. The Burrows-Wheeler transform (BWT), which is the basis of many state-of-the-art general purpose compressors applies a compression algorithm to a permuted version of the original text. List update algorithms are a common choice for this second stage of BWT-bas...

متن کامل

High-performance BWT-based Encoders

In 1994, Burrows and Wheeler [5] developed a data compression algorithm which performs significantly better than Lempel-Ziv based algorithms. Since then, a lot of work has been done in order to improve their algorithm, which is based on a reversible transformation of the input string, called BWT (the Burrows-Wheeler transformation). In this paper, we propose a compression scheme based on BWT, M...

متن کامل

An Application of Self-organizing Data Structures to Compression

List update algorithms have been widely used as subroutines in compression schemas, most notably as part of Burrows-Wheeler compression. The Burrows-Wheeler transform (BWT), which is the basis of many state-of-the-art general purpose compressors applies a compression algorithm to a permuted version of the original text. List update algorithms are a common choice for this second stage of BWT-bas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003