Sequence to Sequence Learning for Optical Character Recognition

نویسندگان

  • Devendra K. Sahu
  • Mohak Sukhwani
چکیده

We propose an end-to-end recurrent encoder-decoder based sequence learning approach for printed text Optical Character Recognition (OCR). In contrast to present day existing state-of-art OCR solution [Graves et al. (2006)] which uses CTC output layer, our approach makes minimalistic assumptions on the structure and length of the sequence. We use a two step encoder-decoder approach – (a) A recurrent encoder reads a variable length printed text word image and encodes it to a fixed dimensional embedding. (b) This fixed dimensional embedding is subsequently comprehended by decoder structure which converts it into a variable length text output. Our architecture gives competitive performance relative to Connectionist Temporal Classification (CTC) [Graves et al. (2006)] output layer while being executed in more natural settings. The learnt deep word image embedding from encoder can be used for printed text based retrieval systems. The expressive fixed dimensional embedding for any variable length input expedites the task of retrieval and makes it more efficient which is not possible with other recurrent neural network architectures. We empirically investigate the expressiveness and the learnability of long short term memory (LSTMs) in the sequence to sequence learning regime by training our network for prediction tasks in segmentation free printed text OCRs. The utility of the proposed architecture for printed text is demonstrated by quantitative and qualitative evaluation of two tasks – word prediction and retrieval.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Seismic Data Forecasting: A Sequence Prediction or a Sequence Recognition Task

In this paper, we have tried to predict earthquake events in a cluster of seismic data on pacific ring of fire, using multivariate adaptive regression splines (MARS). The model is employed as either a predictor for a sequence prediction task, or a binary classifier for a sequence recognition problem, which could alternatively help to predict an event. Here, we explain that sequence prediction/r...

متن کامل

Learning with Supportive Vectors An Introduction to Support Vector Machines and their Applications

Support Vector Machines have acquired a central position in the field of Machine Learning and Pattern Recognition in the past decade and have been known to deliver state-of-theart performance in applications such as text categorization, hand-written character recognition, bio-sequence analysis, etc. In this article we provide a gentle introduction into the workings of Support Vector Machines (a...

متن کامل

Human Reading and the Curse of Dimensionality

Whereas optical character recognition (OCR) systems learn to classify single characters; people learn to classify long character strings in parallel, within a single fixation . This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified im...

متن کامل

Optical Character Recognition Using Artificial Neural Networks Approach

The recent advances in computer technology many recognition task have been automated. OCR, Optical Character Recognition is a scheme of converting the images of typewritten or printed text into a format that is understood by machine. The goal of OCR is to classify the given character data represented by some characteristics, into a predefined finite number of character classes. For the recognit...

متن کامل

Optical Character Recognition as Sequence Mapping

Digitization can provide a means of preserving the content of the materials by creating an accessible facsimile of the object in order to put less strain on already fragile originals such as out of print books. The document analysis community formed to address this by digitizing the content thus, making it easily shareable over Internet, making it searchable and, enabling language translation o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1511.04176  شماره 

صفحات  -

تاریخ انتشار 2015