Symbol Ranking Text Compression

نویسنده

  • Peter Fenwick
چکیده

In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon’s method adds the concept of “symbol ranking”, as in ‘the next symbol is the one 3rd most likely in the present context’. This report describes an implementation of his method and shows that it forms the basis of a good text compressor. The recent “acb” compressor of Buynovsky is shown to belong to the general class of symbol ranking compressors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Symbol Ranking Text Compression with Shannon Recodings

In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon’s method adds the concept of “symbol ranking”, as in ‘the next symbol is the one third most likely in the present cont...

متن کامل

Data Compression Using a Sort-Based Context Similarity Measure

Every symbol in the data can be predicted by taking the immediately preceding symbols, or context, into account. This paper proposes a new adaptive data-compression method based on a context similarity measure. We measure the similarity of contexts using a context sorting mechanism. The aim of context sorting is to store a set of contexts in a speci"c order so that contexts more similar to the ...

متن کامل

Prediction by Compression

It is well known that text compression can be achieved by predicting the next symbol in the stream of text data based on the history seen up to the current symbol. The better the prediction the more skewed the conditional probability distribution of the next symbol and the shorter the codeword that needs to be assigned to represent this next symbol. What about the opposite direction ? suppose w...

متن کامل

Symbol-driven compression of Burrows Wheeler transformed text

Despite the enormous growth in storage capacity in recent years, the search for fast and efficient text compression algorithms continues. As processor speed is increasing at a higher rate than disk access time is decreasing, there is now even more reason to store information in a compressed form than there was previously. Prediction by Partial Matching (PPM), first published in 1984, was a sign...

متن کامل

Can We Do without Ranks in Burrows Wheeler Transform Compression?

Compressors based on the Burrows Wheeler transform (BWT) convert the transformed text into a string of (move-to-front) ranks. These ranks are then encoded with an Ordermodel, or a hierarchy of such models. Although these rank-based methods perform very well, we believe the transformation to MTF numbers blurs the distinction between individual symbols and is a possible cause of inefficiency. Ins...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996