A generic method for structure recognition of handwritten mail documents

نویسندگان

  • Aurélie Lemaitre
  • Jean Camillerapp
  • Bertrand Coüasnon
چکیده

This paper presents a system to extract the logical structure of handwritten mail documents. It consists in two joined tasks: the segmentation of documents into blocks and the labeling of such blocks. The main considered label classes are: addressee details, sender details, date, subject, text body, signature. This work has to face with difficulties of unconstrained handwritten documents: variable structure and writing. We propose a method based on a geometric analysis of the arrangement of elements in the document. We give a description of the document using a two-dimension grammatical formalism, which makes it possible to easily introduce knowledge on mail into a generic parser. Our grammatical parser is LL(k), which means several combinations are tried before extracting the good one. The main interest of this approach is that we can deal with low structured documents. Moreover, as the segmentation into blocks often depends on the associated classes, our method is able to retry a different segmentation until labeling succeeds. We validated this method in the context of the French national project RIMES, which proposed a contest on a large base of documents. We obtain a recognition rate of 91.7% on 1150 images.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mixture of Experts for Persian handwritten word recognition

This paper presents the results of Persian handwritten word recognition based on Mixture of Experts technique. In the basic form of ME the problem space is automatically divided into several subspaces for the experts, and the outputs of experts are combined by a gating network. In our proposed model, we used Mixture of Experts Multi Layered Perceptrons with Momentum term, in the classification ...

متن کامل

A genre Analysis of the Scholarly Electronic Mail: Implications for ‎Pedagogy

Scholarly mails apparently display stable conventional principles as an emerging genre. Thus, contributors should structure their electronic mails appropriately when writing for purposes of discussing professional topics. However, this requirement plunges many a scholar in dilemma as to how to go about this vital undertaking without written structural norms in electronic mail communication. Thi...

متن کامل

Persian Handwritten Digit Recognition Using Particle Swarm Probabilistic Neural Network

Handwritten digit recognition can be categorized as a classification problem. Probabilistic Neural Network (PNN) is one of the most effective and useful classifiers, which works based on Bayesian rule. In this paper, in order to recognize Persian (Farsi) handwritten digit recognition, a combination of intelligent clustering method and PNN has been utilized. Hoda database, which includes 80000 P...

متن کامل

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Recognition-based vs syntax-directed models for numerical field extraction in handwritten documents

In this article, two different strategies are proposed for numerical field extraction in weakly constrained handwritten documents. The first extends classical handwriting recognition methods, while the second is inspired from approaches usually chosen in the field of information extraction from electronic documents. The models and the implementation of these two opposed strategies are described...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008