Ph.D. Dissertation Proposal Handwriting Recognition for Document Images Captured by Portable Cameras
نویسندگان
چکیده
The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity in document image acquisition by supplementing traditional scanning using flatbed scanners and mounted cameras. However, the portability of the camera presents new challenges in document image analysis and recognition in general and handwriting recognition in particular. Problems are posed by low resolution, blur from focus, uneven lighting and warping distortion. Perspective distortion is one major factor that can cause the recognition accuracy of algorithms designed with the assumption of traditional scanning drop significantly. Features considered robust under the traditional scanning scenario often lose distinguishing power while some features that were not considered before become attractive. In this dissertation, the problems of perspective distortion caused by camera-based document imaging will be systematically studied and new algorithms developed for feature evaluation and classifier construction. We will study theoretically and by implementation the underpinnings of the design of a high performance, lexicon-driven offline handwriting recognizers which can adjust automatically for perspective distortion. Specifically, (i) a new feature evaluation measurement will be introduced to quantify the distinguishing power of features, (ii) a training methodology for automatically learning the distortion parameters is proposed, and (iii) a dynamic feature selection strategy based on perplexity and correlation is proposed to select only a subset of features (from a large set) that exhibit high discriminative power given the automatically computed parameters of perspective distortion. A prototype “perspective distortion independent” recognizer will be built and tested on a dataset of historical document images. First, historical documents are usually fragile and must be subjected to minimal handling during the digitization process. This constraint makes portable digital cameras the digitization device of choice. Secondly, historical documents are predominantly handwritten thus offering a rich source of data to test the algorithms. It is expected that the proposed research will make unique contributions in document image processing, image enhancement, and dynamic feature selection parts of handwriting recognition.
منابع مشابه
Recognition of Sequence of Print and Ink Strokes: Investigation the Effect of Handwriting Pressure, Hue of Ink, Printer and Paper Type
By introducing of digital techniques, forensic document examiners has been encouraged to work with better accuracy in non-destructive ways. The aim of this study was to present a non-destructive, accessible, economic (affordable), user friendly, portable, useful and easy technique for specifying the order of crossing lines of ink stroke and printed text. The intersections of LaserJet and In...
متن کاملA New Method for Shading Removal and Binarization of Documents Acquired with Portable Digital Cameras
Photo documents, documents digitized with portable digital cameras, often are affected by non-uniform shading. This paper proposes a new method to remove the shade of document images captured with digital cameras followed by a new binarization algorithm. This method is able to automatically work with images of different resolutions and lighting patterns without any parameter adjustment. The pro...
متن کاملSkew Detection from Natrual Scene Images: A Review
Natural scene images are generally captured with portable devices such as mobile phone cameras. Scene images contains text information as part of captured scene. Scene image text poses difficultly in processing as compared to document text due to complexity of scene and open environment conditions. Scene images usually suffer from skew deformation due to inherent nature of portable capturing de...
متن کاملPhotoDoc: A Toolbox for Processing Document Images Acquired Using Portable Digital Cameras
This paper introduces PhotoDoc a software toolbox designed to process document images acquired with portable digital cameras. PhotoDoc was developed as an ImageJ plug-in. It performs border removal, perspective and skew correction, and image binarization. PhotoDoc interfaces with Tesseract, an open source Optical Character Recognizer originally developed by HP and distributed by Google.
متن کاملProgress in Camera-Based Document Image Analysis
The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are highly mobile and easy to use; they can capture images of any kind of document including very thick ...
متن کامل