Extraction of Projection Profile, Run-Histogram and Entropy Features Straight from Run-Length Compressed Text-Documents
نویسندگان
چکیده
Document Image Analysis, like any Digital Image Analysis requires identification and extraction of proper features, which are generally extracted from uncompressed images, though in reality images are made available in compressed form for the reasons such as transmission and storage efficiency. However, this implies that the compressed image should be decompressed, which indents additional computing resources. This limitation induces the motivation to research in extracting features directly from the compressed image. In this research, we propose to extract essential features such as projection profile, run-histogram and entropy for text document analysis directly from run-length compressed textdocuments. The experimentation illustrates that features are extracted directly from the compressed image without going through the stage of decompression, because of which the computing time is reduced. The feature values so extracted are exactly identical to those extracted from uncompressed images. KeywordsCompressed data, Run-length compressed document, Projection profile, Entropy, Run-histogram.
منابع مشابه
Direct Processing of Document Images in Compressed Domain
With the rapid increase in the volume of Big data of this digital era, fax documents, invoices, receipts, etc are traditionally subjected to compression for the efficiency of data storage and transfer. However, in order to process these documents, they need to undergo the stage of decompression which indents additional computing resources. This limitation induces the motivation to research on t...
متن کاملAutomatic Detection of Font Size Straight from Run Length Compressed Text Documents
Automatic detection of font size finds many applications in the area of intelligent OCRing and document image analysis, which has been traditionally practised over uncompressed documents, although in real life the documents exist in compressed form for efficient storage and transmission. It would be novel and intelligent if the task of font size detection could be carried out directly from the ...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملIsolated Persian/Arabic handwriting characters: Derivative projection profile features, implemented on GPUs
For many years, researchers have studied high accuracy methods for recognizing the handwriting and achieved many significant improvements. However, an issue that has rarely been studied is the speed of these methods. Considering the computer hardware limitations, it is necessary for these methods to run in high speed. One of the methods to increase the processing speed is to use the computer pa...
متن کاملBangla Text Recognition from Video Sequence: A New Focus
extraction and recognition of Bangla text from video frame images is challenging due to complex color background, low-resolution etc. In this paper, we propose an algorithm for extraction and recognition of Bangla text form such video frames with complex background. Here, a two-step approach has been proposed. First, the text line is segmented into words using information based on line contours...
متن کامل