Document Image Segmentation Using Dynamic Thresholds and Identification of Each Region Type
نویسندگان
چکیده
Nowadays, the accumulation of paper in the life of business professional is overwhelming. Digital documents on the other hand being less expensive and more efficient are on the road to a more organized office. Also crucial for many business applications document image analysis is needed before OCR operation. Segmentation of document images into text and non-text regions is essential because OCR recognition engine produces garbage text when it gets non-text components as input. In this paper a segmentation technique is presented to decompose document image into its constituent parts, such as text blocks, pictures and tables. This technique is implemented by combining two approachesrun length smearing algorithm and recursive top down approach. Recursive top down approach works using horizontal and vertical projection profiles. Proposed technique is threshold based but threshold values are automatically calculated depending upon the geometric layout of the document. Binarization and noise removal are done as part of preprocessing. This approach works for documents in any script and in manhattan layout. Keywordsdocument image segmentation, dynamic thresholds, horizontal and vertical projection profiles, OCR, recursive top down segmentation, run
منابع مشابه
Persian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملPlant Classification in Images of Natural Scenes Using Segmentations Fusion
This paper presents a novel approach to automatic classifying and identifying of tree leaves using image segmentation fusion. With the development of mobile devices and remote access, automatic plant identification in images taken in natural scenes has received much attention. Image segmentation plays a key role in most plant identification methods, especially in complex background images. Wher...
متن کاملAdaptive Region Growing Color Segmentation for Text Using Irregular Pyramid
This paper presents the result of an adaptive region growing segmentation technique for color document images using an irregular pyramid structure. The emphasis is in the segmentation of textual components for subsequence extraction in document analysis. The segmentation is done in the RGB color space. A simple color distance measurement and a category of color thresholds are derived. The propo...
متن کاملModified image segmentation method based on region growing and region merging
Image segmentation is one of the basic concepts widely used in each and every fields of image processing. The entire process of the proposed work for image segmentation comprises of 3 phases: Threshold generation with Dynamic Modified Region Growing phase (DMRG), texture feature generation phase and region merging phase. by dynamically changing two thresholds, the given input image can be perfo...
متن کامل