Segmentation of scanned documents for efficient compression
نویسندگان
چکیده
A scanned, complex document image may be composed of text, graphics, halftones, and pictures, whose layout is unknown. In this paper, we propose a novel segmentation scheme for scanned document images that facilitates their efficient compression. Our scheme segments an input image into binarizable components and non-binarizable components. By a binarizable component we mean that the region can be represented by no more than two gray levels (or colors) with acceptable perceptual quality. A non-binarizable component is defined as a region that has to be represented by more than two gray levels (or colors) with acceptable perceptual quality. Once the components are identified, the binarizable components can be thresholded and compressed as a binary image using an efficient binary encoding scheme together with the gray values represented by the black and white pixels of the binary image. The non-binarizable components can be compressed using another suitable encoding scheme.
منابع مشابه
A General Segmentation Scheme for Djvu Document Compression
We describe the “DjVu” (Déjà Vu) technology: an efficient document image compression methodology, a file format, and a delivery platform that together, enable instant access to high quality documents from essentially any platform, over any connection. Originally developed for scanned color documents, it was recently expanded to electronic documents, so DjVu has now truly become a universal docu...
متن کاملBlock-based segmentation and adaptive coding for visually lossless compression of scanned documents
This paper presents a novel block-based segmentation and adaptive coding(BSAC) algorithm for visually lossless compression of scanned documents that contain not only photographic images but also text and graphic images. For such compound image source, we structure the image into nonoverlapping blocks and classify each block into four different classes based on the empirical statistics within th...
متن کاملDocument Image Segmentation and Compression
Cheng, Hui, Ph.D., Purdue University, August, 1999. Document Image Segmentation and Compression. Major Professor: Charles A. Bouman. In the first part of this research, we propose an image segmentation algorithm called the trainable sequential MAP (TSMAP) algorithm. The TSMAP algorithm is based on a multiscale Bayesian approach. It has a novel multiscale context model which can capture complex ...
متن کاملAn Efficient Character Segmentation Algorithm for Printed Chinese Documents
The character segmentation technology for printed documents is applied in many fields. This paper proposes an efficient character segmentation algorithm for Chinese printed documents, which is suitable for paper watermarking system. This algorithm is composed of three main steps: connected regions recognition, connected regions merging, and fine-gained segmentation, through what the algorithm s...
متن کاملJPEG2000-matched MRC compression of compound documents
The Mixed Raster Content (MRC) ITU document compression standard (T.44) specifies a multilayer decomposition model for compound documents into two contone image layers and a binary mask layer for independent compression. While T.44 does not recommend any procedure for decomposition, it does specify a set of allowable layer codecs to be used after decomposition. While T.44 only allows older stan...
متن کامل