A Novel Approach to Compress Centralized Text Data using Indexed Dictionary

نویسندگان

  • Vivek Dimri
  • Ranjit Biswas
چکیده

Data compression is very important feature in terms of saving the memory space. In this proposal, an indexed dictionary based compression is used for text data, where the word’s reference in dictionary is used in compression. This approach is not file based; a common dictionary is used for compression. Which contains the words, the position of the word in dictionary is one of the key parts of encoded frame which is compressed form of the text word. This is lossless compression. This compression approach is also take cares of small words like one or two characters words which usually decrease the efficiency of compression algorithms. This approach is also deals with file having special characters as a word. Special character words, alpha numeric words, normal texted words and small words all deals differently which makes this approach more efficient. Since a centralized dictionary is used for data compression, therefore; this approach is not preferred for transfer compressed file, while it is suitable to store text data in compressed form in hard disk drive and centralized storage or cloud drive for memory utilization. Keywords—Centerlized dictionary based compression; Text file compression;

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Inverted Files to Compress Text

This is the first report on a new approach to text compression. It consists of representing the text file with compressed inverted file index in conjunction with very compact lexicon, where lexicon includes every word in the text. The index is compressed using standard index compression techniques, and lexicon is compressed by original dictionary compression method that gives better compression...

متن کامل

Dictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application

The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...

متن کامل

Efficient compression method for pronunciation dictionaries

Pronunciation dictionaries are often used with other datadriven methods to model the pronunciations in phonemebased automatic speech recognition (ASR) and text-to-speech (TTS) systems. The dictionaries usually take a great amount of memory, which is a limiting factor in portable handheld devices. Compressing the pronunciation dictionaries results in minimal transmission bandwidth and less stora...

متن کامل

A Novel Face Detection Method Based on Over-complete Incoherent Dictionary Learning

In this paper, face detection problem is considered using the concepts of compressive sensing technique. This technique includes dictionary learning procedure and sparse coding method to represent the structural content of input images. In the proposed method, dictionaries are learned in such a way that the trained models have the least degree of coherence to each other. The novelty of the prop...

متن کامل

A Novel Image Denoising Method Based on Incoherent Dictionary Learning and Domain Adaptation Technique

In this paper, a new method for image denoising based on incoherent dictionary learning and domain transfer technique is proposed. The idea of using sparse representation concept is one of the most interesting areas for researchers. The goal of sparse coding is to approximately model the input data as a weighted linear combination of a small number of basis vectors. Two characteristics should b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1512.07153  شماره 

صفحات  -

تاریخ انتشار 2015