Restoration of Arbitrarily Warped Document Images Based on Text Line and Word Detection

نویسندگان

  • B. Gatos
  • K. Ntirogiannis
چکیده

This paper presents a novel technique for efficient restoration of arbitrarily warped document images. Our aim is to recover document images that are mainly bounded volumes captured by a digital camera and suffer from non-linear warp. The proposed technique is applied on gray scale document images and is based on several distinct steps: an adaptive document image binarization, a text line and word detection, a first draft binary image dewarping based on word rotation and shifting and, finally, a complete restoration of the original grayscale warped image guided by the binary dewarping result. In this paper, we present a detailed description of the proposed technique as well as the implementation results for each step of our methodology. The experimental results on several arbitrarily warped documents indicate the effectiveness of the proposed technique.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)

Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...

متن کامل

رفع اعوجاج هندسی متون به‌کمک اطلاعات هندسی خطوط متن

Document images produced by scanners or digital cameras usually have photometric and geometric distortions. If either of these effects distorts document, recognition of words from such a document image using OCR is subject to errors. In this paper we propose a novel approach to significantly remove geometric distortion from document images. In this method first we extract document lines from do...

متن کامل

Image dewarping and text extraction from mobile captured distinct documents

Camera Based Document Analysis (CBDA) is an emerging field in computer vision and pattern recognition. In recent days, cameras are moulded with several items of additional equipment. Thus, they play a vital role in the replacement of scanners with hand-held imaging devices (HIDs) like digital cameras, mobile phones and gaming devices. Warping is a common appearance in camera captured document i...

متن کامل

Straightening warped text lines using polynomial regression

Perspective distortion always occurs while scanning thick, bound documents, resulting in two problems in the scanned grayscale image – (i) shade along the ‘spine’ of the book, and (ii) warping of words in the shade area. We proposed a restoration system to solve these two problems in our previous paper [1]. However the shape of the warped words was not fully restored, since we simply shifted an...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006