Newspaper Document Analysis Featuring Connected Line Segmentation
نویسندگان
چکیده
This paper presents an algorithm designed to segment and classify newspaper documents. A notable feature of this algorithm is the ability to detect lines in the document – including lines that are connected to other components. A bottom-up approach is used to segment the image into patterns, and then each pattern is classified into one of seven types. Complete regions are then formed from the classified patterns.
منابع مشابه
Persian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملArabic Newspaper Page Segmentation
The aim of layout analysis is to extract the geometric structure from a document image. It consists of labeling homogenous regions of a document image. This paper describes the performance of segmentation algorithms and their adaptation in order to treat complex structured Arabic documents such as newspapers. Experimental tests have been carried out on four different phases of newspaper image a...
متن کاملPage segmentation and text extraction from gray-scale images in microfilm format
The paper deals with a suitably designed system that is being used to separate textual regions from graphics regions and locate textual data from textured background. We presented a method based on edge detection to automatically locate text in some noise infected grayscale newspaper images with microfilm format. The algorithm first finds the appropriate edges of textual region using Canny edge...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کامل