Persian Printed Document Analysis and Page Segmentation

Authors

  • Jamshid Shanbehzadeh tarbiat moalem
Abstract:

This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifying them as texts, images, and tables/drawings. The proposed method was experiment with the Persian documents. The result of these tests have shown that the proposed method provide more accurate and speed results.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

persian printed document analysis and page segmentation

this paper presents, a hybrid method, low-resolution and high-resolution, for persian page segmentation. in the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. by high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifying...

full text

Skew Detection, Page Segmentation, and Script Classiication of Printed Document Images

Automatic processing of international documents presents a number of challenging problems because Optical Character Recognition (OCR) techniques are not available for all languages and all script classes. Document images must be categorized according to their script type rst, in our case Roman, Ideographic, or Arabic. We present a set of statistical methods that rst detect and correct the skew ...

full text

Ground-truthing and benchmarking document page segmentation

We describe a new approach for evaluating page segmentation algorithms. Unlike techniques that rely on OCR output, our method is region-based: the segmentation output, described as a set of regions together with their types, output order etc., is matched against the pre-stored set of ground-truth regions. Misclassifications, splitting, and merging of regions are among the errors that are detect...

full text

Line and Ligature Segmentation in Printed Urdu Document Images

This paper presents a technique for segmentation of printed Urdu text images into lines and ligatures, a key pre-processing step in Urdu Optical Character Recognition (OCR) systems. Unlike classical projection profile based line segmentation methods, the proposed scheme successfully segments overlapping and touching lines. Once the lines are segmented, ligatures are extracted from each text lin...

full text

Persian/Arabic Document Segmentation Based On Pyramidal Image Structure

Automatic transformation of paper documents into electronic documents requires document segmentation at the first stage. However, some parameters restrictions such as variations in character font sizes, different text line spacing, and also not uniform document layout structures altogether have made it difficult to design a general-purpose document layout analysis algorithm for many years. Thus...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 1  issue 1

pages  -

publication date 2010-02-25

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023