Skew detection and text line position determination in digitized documents

نویسندگان

  • Basilios Gatos
  • Nikos Papamarkos
  • Christodoulos Chamzas
چکیده

-This paper proposes a computationally efficient procedure for skew detection and text line position determination in digitized documents, which is based on the cross-correlation between the pixels of vertical lines in a document. The determination of the skew angle in documents is essential in optical character recognition systems. Due to the text skew, each horizontal text line intersects a predefined set of vertical lines at nonhorizontal positions. Using only the pixels on these vertical lines we construct a correlation matrix and evaluate the skew angle of the document with high accuracy. In addition, using the same matrix, we compute the positions of text lines in the document. The proposed method is tested on a variety of mixed-type documents and it provides good and accurate results while it requires only a short computational time. We illustrate the effectiveness of the algorithm by presenting four characteristic examples. (~ 1997 Pattern Recognition Society. Published by Elsevier Science Ltd. Skew detection Hough transform Character recognition Segmentation

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modified Self-organizing Maps for Line Extraction in Digitized Text Documents

. Different authors have developed modifications of the Kohonen Self-Organizing Maps to solve known combinatorial optimization problems. In this paper a modification of the Kohonen Map is proposed to solve the detection of white inter-text spaces in a digitized plain text documents. The idea relies on the fact that line extraction problem has several features which match easily with Kohonen net...

متن کامل

Document Decomposition of Bangla Printed Text

skew, Auto rotation. Abstract: Today all kind of information is getting digitized and along with all this digitization, the huge archive of various kinds of documents is being digitized too. We know that, Optical Character Recognition is the method through which, newspapers and other paper documents convert into digital resources. But, it is a fact that this method works on texts only. As a res...

متن کامل

Skew Detection Technique for Various Scripts

This paper includes the information about the technique used to detect Skew which are introduced during the scanning of the documents. It also discusses about the tool which have been used to implement the technique. The algorithm has been implemented on various scripts. The method provides a very efficient way to calculate the Skew. Correction in the skewed scanned document image is very impor...

متن کامل

Resolution Independent Skew and Orientation Detection for document images

In large scale scanning applications, orientation detection of the digitized page is necessary for the following procedures to work correctly. Several existing methods for orientation detection use the fact that in Roman script text, ascenders are more likely to occur than descenders. In this paper, we propose a different approach for page orientation detection that uses this information. The m...

متن کامل

Local Skew Correction in Documents

In this paper we propose a technique for detecting and correcting the skew of text areas in a document. The documents we work with may contain several areas of text with different skew angles. First, a text localization procedure is applied based on connected components analysis. Specifically, the connected components of the document are extracted and filtered according to their size and geomet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition

دوره 30  شماره 

صفحات  -

تاریخ انتشار 1997