A Tree Based Binary Encoding of Text Using LZW Algorithm

نویسندگان

  • Tinku Acharya
  • Amar Mukherjee
چکیده

The most popular adaptive dictionary coding scheme used for text compression is the LZW algorithm [2]. In the LZW algorithm, a changing dictionary contains common strings that have been encountered so far in t,he text. The dictionary can be represented by a dynamic trie. The input text is examined character by character and the longest substring (called a prefix string) of the text which already exists in the trie, is replaced by a pointer to a node in the trie which represents, the prefix string. Motivation of our research is to explore a variation of the LZW algorithm for variable-length binary encoding of text (we call it LZWA algorithm) and to develop a memory-based VLSI architecture for text compression. We proposed a new methodology to represent the trie in the form of a binary tree (we call it binary trie) to maintain the dictionary used in the LZW scheme. This binary tree maintains all the properties of the trie and can easily be mapped into memory. As a result, the common substrings can be encoded using variable length prefix binary codes. The prefix codes enable us to uniquely decode the text in its original form. Formal definition of the binary trie and the detail schemes for memory mapping of the binary tries during both encoding and decoding operations have been presented in [l]. The proposed algorithms have been implemented and tested with different kinds of texts such as C source file, electronic messages etc. The algorithm outperforms the usual LZW scheme when the size of the text is small (usually less than 5K). Depending upon the characteristics of the text, the improvement of the compression ratio has been achieved around 10.30% compared to the LZW scheme. But its performance degrades for larger size texts. The main reason of this degradation is that most of the branches of the binary trie become highly skewed for less frequently appearing words. As a result, the performance is degraded when the words are encoded using the binary sequences of these highly skewed branches. Generating a height balanced binary trie will prova,bly be a research challenge. One possible remedy is to adaptively cut the highly skewed paths and drop some of the branches of the binary trie which correspond to the less frequently appearing words. There are many applications in the wireless media which deal with small size texts. The results of different variations including the VLSI implementation of the chip will be subject matter of our future papers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Lempel-Ziv Codes Using an On-Line Variable Length Binary Encoding

LZW Algorithm is the most popular dictionary-based adaptive text compression scheme [l]. In LZW algorithm, a changing dictionary contains common strings that have been encountered so far in the text. Motivation of this research is to explore an on-line variablelength binary encoding. We apply this encoding to LZW codes for remedy of the problem that we discussed in our earlier paper in DCC’95 [...

متن کامل

An improved algorithm to reconstruct a binary tree from its inorder and postorder traversals

It is well-known that, given inorder traversal along with one of the preorder or postorder traversals of a binary tree, the tree can be determined uniquely. Several algorithms have been proposed to reconstruct a binary tree from its inorder and preorder traversals. There is one study to reconstruct a binary tree from its inorder and postorder traversals, and this algorithm takes running time of...

متن کامل

Self-adapting Radar Video Echo Acquisition System based on LZW Algorithm

This paper presents a high-speed low-complexity Field Program Gate Array (FPGA) design and implementation of the lossless Lempel-Ziv-Welch (LZW) algorithm on Xilinx Virtex-E device family for selfadapting Radar video echo data acquisition Applications. A multi-channel self-adaptive variable sampling rate data acquisition system based on FPGA and I2C bus is introduced. By writing the frame data ...

متن کامل

A Unifying Framework for Compressed Pattern Matching

We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW), byte-...

متن کامل

cient Lossless Compression of Trees and Graphs

In this paper, we study the problem of compressing a data structure (e.g. tree, undirected and directed graphs) in an eecient way while keeping a similar structure in the compressed form. To date, there has been no proven optimal algorithm for this problem. We use the idea of building LZW tree in LZW compression to compress a binary tree generated by a stationary ergodic source in an optimal ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001