Persian Printed Document Analysis and Page Segmentation
Authors
Abstract:
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifying them as texts, images, and tables/drawings. The proposed method was experiment with the Persian documents. The result of these tests have shown that the proposed method provide more accurate and speed results.
similar resources
persian printed document analysis and page segmentation
this paper presents, a hybrid method, low-resolution and high-resolution, for persian page segmentation. in the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. by high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifying...
full textSkew Detection, Page Segmentation, and Script Classiication of Printed Document Images
Automatic processing of international documents presents a number of challenging problems because Optical Character Recognition (OCR) techniques are not available for all languages and all script classes. Document images must be categorized according to their script type rst, in our case Roman, Ideographic, or Arabic. We present a set of statistical methods that rst detect and correct the skew ...
full textGround-truthing and benchmarking document page segmentation
We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely on OCR output, our method is region-based: the segmentation output, described as a set of regions together with their types, output order etc., is matched against the pre-stored set of ground-truth regions. Misclassifications, splitting, and merging of regions are among the errors that are detect...
full textLine and Ligature Segmentation in Printed Urdu Document Images
This paper presents a technique for segmentation of printed Urdu text images into lines and ligatures, a key pre-processing step in Urdu Optical Character Recognition (OCR) systems. Unlike classical projection profile based line segmentation methods, the proposed scheme successfully segments overlapping and touching lines. Once the lines are segmented, ligatures are extracted from each text lin...
full textPersian/Arabic Document Segmentation Based On Pyramidal Image Structure
Automatic transformation of paper documents into electronic documents requires document segmentation at the first stage. However, some parameters restrictions such as variations in character font sizes, different text line spacing, and also not uniform document layout structures altogether have made it difficult to design a general-purpose document layout analysis algorithm for many years. Thus...
full textMy Resources
Journal title
volume 1 issue 1
pages -
publication date 2010-02-25
By following a journal you will be notified via email when a new issue of this journal is published.
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023