Unbounded Length Contexts for PPM

نویسندگان

  • John G. Cleary
  • William John Teahan
  • Ian H. Witten
چکیده

The PPM data compression scheme has set the performance standard in lossless compression of text throughout the past decade. PPM is a "nite-context statistical modelling technique that can be viewed as blending together several "xed-order context models to predict the next character in the input sequence. This paper gives a brief introduction to PPM, and describes a variant of the algorithm, called PPM*, which exploits contexts of unbounded length. Although requiring considerably greater computational resources (in both time and space), this reliably achieves compression superior to the benchmark PPMC version. Its major contribution is that it shows that the full information available by considering all substrings of the input string can be used effectively to generate high-quality predictions. Hence, it provides a useful tool for exploring the bounds of compression.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unbounded Length Contexts for Ppm Ppm*c Model after Processing the String

\Compression of individual sequences via variable rate coding", IEEE Transactions on Information Theory, 24(5), 530{536. [] a[] abra[] abrac ac ad bra[] brac c d ra[] rac a r d [] r c a a a a b b []abracadabra a[]abracadabr abra[]abracad abracadabra[] acadabra[]abr adabra[]abrac bra[]abracada bracadabra[]a cadabra[]abra dabra[]abraca ra[]abracadab racadabra[]ab abracadabra[] []abracadabra a[]ab...

متن کامل

Ensemble Prediction by Partial Matching

Prediction by Partial Matching (PPM) is a lossless compression algorithm which consistently performs well on text compression benchmarks. This paper introduces a new PPM implementation called PPM-Ens which uses unbounded context lengths and ensemble voting to combine multiple contexts. The algorithm is evaluated on the Calgary corpus. The results indicate that combining multiple contexts leads ...

متن کامل

Experiments on the zero frequency problemJohn

1 Introduction The best algorithms for lossless compression of text are those which adapt to the text being compressed 1]. Two classes of such adaptive techniques are commonly used. One class matches the text against a dictionary of strings seen and transforms the text into a list of indices into the dictionary. These techniques are usually formulated as a variant on Ziv-Lempel (LZ) compression...

متن کامل

Experiments on the zero frequency problem

The best algorithms for lossless compression of text are those which adapt to the text being compressed [1]. Two classes of such adaptive techniques are commonly used. One class matches the text against a dictionary of strings seen and transforms the text into a list of indices into the dictionary. These techniques are usually formulated as a variant on Ziv-Lempel (LZ) compression. While LZ com...

متن کامل

PPM compression without escapes

A significant cost in PPM data compression (and often the major cost) is the provision and efficient coding of escapes while building contexts. This paper presents some recent work on eliminating escapes in PPM compression, using bit-wise compression with binary contexts. It shows that PPM without escapes can achieve averages of 2.5 bits per character on the Calgary Corpus and 2.2 bpc on the Ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comput. J.

دوره 40  شماره 

صفحات  -

تاریخ انتشار 1995