Achieving Better Compression Applying Index-based Byte-Pair Transformation before Arithmetic Coding
نویسندگان
چکیده
Arithmetic coding is used in many compression techniques during the entropy encoding stage. Further compression is not possible without changing the data model and increasing redundancy in the data set. To increase the redundancy, we have applied index based byte-pair transformation (BPT-I) as a pre-processing to arithmetic coding. BPT-I transforms most frequent byte-pairs (2-byte integers). Here, most frequent byte-pairs are sorted in the order of their frequency and groups consisting of 256 byte-pairs are formed. Each bytepair in a group is then encoded using two tokens: group number and the location in a group. Group number is denoted using variable length prefix codeword; whereas location within a group is denoted using 8-bit index. BPT-I is designed to be applied on any type of source; not necessarily text. More the number of groups considered during transformation, better is the compression. Experimental results have shown around 4.30% additional reduction in compressed file size when arithmetic coding is applied after byte-pair data transformation BPT-I. General Terms Data Compression, Algorithms
منابع مشابه
Byte Pair Transformation using Zero-Frequency Bytes with Varying Number of Passes
Byte pair encoding (BPE) algorithm was suggested by P. Gage is to achieve data compression. It encodes all instances of most frequent byte-pair using zero-frequency byte in the source data. This process is repeated for maximum m possible number of passes until no further compression is possible, either because there are no more frequently occurring byte pairs or there are no more unused zero-fr...
متن کاملQuad-Byte Transformation using Zero-frequency Bytes
Byte pair encoding (BPE) algorithm was suggested by P. Gage is to achieve data compression. It encodes all instances of most frequent byte-pair using zerofrequency byte in the source data. This process is repeated for maximum m possible number of passes until no further compression is possible, either because there are no more frequently occurring byte pairs or there are no more unused zero-fre...
متن کاملContext-Based Arithmetic Coding for the DCT: Achieving high compression rates with block transforms and simple context modeling
Recent image compression schemes have focused primarily on wavelet transforms, culminating in the JPEG-2000 standard. Block based DCT compression, on which the older JPEG standard is based, has been largely neglected because wavelet based coding methods appear to offer better image quality. This paper presents a simple compression algorithm that uses arithmetic coding on the bit-planes of the D...
متن کاملData Compression Modelling: Huffman and Arithmetic
The paper deals with formal description of data transformation (compression and decompression process). We start by briefly reviewing basic concepts of data compression and introducing the model based approach that underlies most modern techniques. Then we present the arithmetic coding and Huffman coding for data compression, and finally see the performance of arithmetic coding. And conclude th...
متن کاملEfficient modification of LZSS compression algorithm
This paper presents a new method of lossless data compression called LZPP, being an advanced modification of the well-known algorithm LZSS [1]. It introduces improvements of the LZ family algorithms [2, 3], such as the use of a special coding of two and three byte matches, use of an auxiliary entropy coder and new criteria of symbol exclusions. Minimization of the data compression ratio (bpc) h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014