Analytic Word Recognition without Segmentation

نویسنده

  • A. Belaïd
چکیده

Nowadays, the real tendency in handwriting recognition is oriented towards the analytical approach. For such orientation the letter model learning quality is primordial. Consequently, it is needed to reduce the human intervention in this processing phase in order to limit the initial bias. Currently, the analytical approaches are divided into two categories depending on the use or not of the segmentation process. The machine power allows them to run the second category of problems. The advantage of working without rather than with segmentation is that it allows the model to optimally determine the letter position information. The problem posed by this approach has been outlined by Sayre : “to learn letters, it is needed to localize them, and to localize letters it is needed to have learned them”. Additionally, the reading psychology instructs that the reading is operated at the word level : hence, the human does not delimit explicitly the letters. To properly model the writing it is necessary to perform the learning at the word level. Based on this principle, we have studied a learning approach allowing to extract the letter information without segmentation and optimising the learning at the word level. In a first stage of the system, we have used a hybrid model based on HMMs and random fields (NSHP-HMM). This approach has been experimented on bank check legal amounts, leading to 84% for an industrial database. In a second phase, we have reinforced the previous approach by using the NSHP-HMM for word normalisation and neural networks for recognition. The system score is improved up to 87%. Introduction Research on writing recognition recently showed the superiority of 2D models on 1D ones. As asserted by different works [agazzi93a, bippus97a, gilloux95a, park96a, simon94a], they take better into account the plane nature of the writing. The literature shows three types of 2D models: Neural Network (NN), Planar-HMM (PHMM) and Hidden Markov Mesh Random Fields (HMMRF). The NN can be applied either on letters [simon94a] or on graphemes [gilloux95a]. They are used in [gader97a] to model the inter-character confidence. Their major drawback is their lack of elasticity: having a fixed input size, they cannot adapt to length variability, and they are very sensitive to important distortions. To deal with length variability the use of specific NN such TDNN and recurrent NN were proposed [senior98a]. The drawback of this approach lies in the difficulty to automatically label the network observations according to the current observed letter; this information is necessary to correctly train the NN. The PHMM was successfully applied in many works [agazzi93a,bippus97a]. Composed of secondary HMMs and a principal HMM for the correlation, this model has interesting 2D elasticity properties. But it requires an independence hypothesis between the secondary models which is not realistic in practice. The HMMRF was applied on handwritten Hangul characters recognition with good performances [park96a]. But it needs some non-realistic hypothesis to be tractable and its use remains very costly in computational time. Some other works deal with 2-dimensional warping under some specific constraints with interesting results [uchida99a, ronee01a]. Saon proposed in [saon97a] a 2D model combining Markov fields and HMMs: the NSHP-HMM (Non-Symmetric Half-plane Hidden Markov Model). Applied on binary images, it takes better into account the 2D writing nature by using 2D neighbourhoods. The HMM part confers to it a horizontal elasticity enabling it to adapt to the analysed samples length. Using a 2D neighbourhood for the pixel observation, it overcomes the column independence hypothesis of the PHMMs. Its use as a global approach showed some limits. Particularly, the NSHP-HMM needs a high number of parameters. Furthermore, the efficiency of this approach is proved only for restricted and distinct vocabulary (similar words will lead to misclassification, small differences being absorbed by the models). To overcome these limits, an analytic approach is proposed. It is based on a concatenation of letter models, allowing to work with a large vocabulary (words) by using restricted components (letters). Each letter is modelled by a NSHP-HMM. This also reduces the global complexity of the approach, which is limited to letter modelling. Classically, the analytic word recognition approaches are leaned on grapheme segmentation [chen94a,gilloux95a,shridhar97a] which cannot be 100% reliable because it is usually based on topological criterions [chen94a,negi95a]. For this reason, it seemed better to us to let the system decide itself which part of the image belongs to which letter. The use of the Baum algorithm [baum68a,rabiner89a] allows the system to find the best parameters repartition in the letter models, knowing only the label of the words learned [choisy00b]. The re-estimation of letter models and transitions between letters is made by cross-learning. This technique is directly derived from the Baum re-estimation formulas. It was used in [elyacoubi99a] to automatically learn the graphemes label in a segmentation-based approach; in this work transitions between letters are estimated on the word labels of the database. This approach needs to know the exact label of each word of the database. In our case no segmentation is necessary and all the parameters for letter and word models are estimated in the same time. No exact knowledge on word image labels is necessary: the only need is to model all the possible orthographies of each word in the corresponding class model. NSHP-HMM The NSHP-HMM is a stochastic model combining the properties of Markov fields and HMM. The observation probabilities in the HMM states (NSHP state) is estimated by a Markov Random Field (MRF). This probability is performed as the product of elementary probabilities performed on each pixel in the observed column. The elementary probability is determined by the MRF according to a 2D neighbourhood fixed in the half plane previously analysed. The NSHP-HMM learning is based on the Baum-Welch algorithm. This algorithm ensures the convergence to a local optimum on the learning set. For word modelling we use meta-HMMs in which each meta-state represents a letter. Starting from a meta-model, a global NSHP-HMM is built by connecting the NSHP-HMM associated to the meta-model letter states. The principle of the cross-re-estimation is to synthesize this information for all the models associated with a same letter in the various meta-models. The Baum-Welch formulae are applied directly by summing the information over all the occurrences of a letter model in all the word models. Meta-model re-estimation Cross-learning allows to train the letter models but cannot allow to re-estimate the letter appearance probabilities. This information is re-estimated independently for each word meta-model, which allows to model the different orthographical variations of each word. The first experiments were carried out on French bank check words. The lexicon contains 28 word classes, the database counts 25260 word images from an industrial real application. The metamodels synthesize the frequent misspellings found in the words. One of the cross-learning advantage is that it allows to initialise equiprobably all the models and meta-models: there is no need of specific initialisation (see Table 1) . Top1 Top2 Top3 Top4 Top5 Top10 Baum-Welch 85.49% 91.98% 94.53% 95.70% 96.56% 98.58% Viterbi 84.72% 91.57% 94.19% 95.76% 96.64% 98.66% Combination 85.28% 92.02% 94.51% 95.87% 96.76% 98.58% Hybrid 86.34% 92.32% 94.73% 96.08% 97.00% 98.72% Table 1: First experiments Global and Local Vision model The results shown in the previous section proves the modelling capacity of the NSHP-HMM. This efficiency is due to their capacity of absorbing the deformations based on local observation and on dynamic programming techniques. For these reasons, we call them Local Vision Models (LVM). However, the use of these kind of models fails in estimating the global coherence of the analysed patterns because of their independence hypothesis on local observations. On the other hand, Global Vision Models such as NN, SVM... are able to make global correlations on the total pattern but they are sensitive to distortions. They have a fixed input size avoiding them to correctly face to size variation. We combined the power of these two different approaches: LVM for normalization and GVM for recognition. LVM is used for searching the important features. According to the localisation of these features the pattern is normalized in a standard size. Then the normalized pattern is analysed by a GVM to estimate global correlations between these features. We used NSHP-HMM as MVL and MLP as GVM. Concerning the normalisation procedure, we use the Viterbi algorithm to find the best repartition of the image columns in the word NSHP-HMM states. The normalised image presents one column for each NHSP state, the column content corresponds to the average of the columns observed by this state in the original image (see Figure 1). Figure 1: Normalisation of the word "et" by NHSP-HMM In order to give a notion of a planar vision for MLP models, we used a particular layer topology. Hidden layer neurons observe overlapping rectangles on the previous layer. Some experiments on the NIST database show a recognition score improvement of about 0.7% while reducing the model complexity. This topology enables to equiprobably initialise the neurons and link weights. Each word class is modelled by one NSHP-HMM and has a corresponding MLP to estimate the global coherence. At the recognition step, an image is first normalised by all the word NSHPHMM. Each normalised image is then analysed by the corresponding MLP. These results are synthesised by a final MLP giving the final probability for each word class. This concept was applied on the same base and comparisons are made up with NSHP results and linear normalisation (see Table 2). Normalisation Top1 Top2 Top3 Top5 MLP+NSHP Normalisation 87.412% 93.05% 95.41% 96.86% MLP+ linear normalisation 84.47% 90.92% 93.43% 95.50% NSHP-HMM 84.72% 91.57% 94.19% 96.64% Table 2: Results after normalisation The use of the MLP with NSHP-HMM normalisation improves NSHP-HMM results with more than 2.5% showing the interest of this concept. The linear normalisation obtains lower results than the NSHP-HMM itself, confirming the weakness of the GVM faced to distortions. Two points are interesting to notice. First, all the models are equiprobably initialised, limiting the bias of an empirical initialisation. Second, the NSHP-HMMs are trained in an analytic approach as seen previously; this allows to extend this work to an analytical approach for the GVM. This kind of extension is impossible with a classical linear normalisation that gives no information on the nature of the image part. Conclusion and perspectives We related in this work some new aspects corresponding to an efficient handwriting recognition system. First, we have proposed a cross-learning method enabling the training of letters towards words without segmentation. Second, the use of this model is extended as an image normalization tool before the recognition by an MLP. The results obtained confirm the NSHP-HMM interest for handwriting modelling. It also shows that the process modelling is different from the recognition process. This can be seen in the MLP results which are better than those of NSHP-HMM when the distortions are absorbed by the normalisation process. We showed in this work that a low level information is not necessary to correctly and automatically train the model. Indeed, all our models are initialised in a equiprobable manner and the unique information corresponds to the vocabulary. The perspectives of work is to open the LVM/GVM concept to the analytical approach. The LVMare already trained analytically. It only remains to extend the GVM use to the analytical way. In thecase of a reduced vocabulary, the adaptation consists in focussing the GVM on the letters given bythe LVMs. The objective is to open the vocabulary which will present other handicaps. In this case,we have to find the n best letter sequence describing the input image. The image is normalizedaccording to each sequence and will be analysed by the letter MLP. For each sequence, the MLPresults should be synthesised in order to determine the analytical image composition. References[agazzi93a] O.E. Agazzi, and S. Kuo. Hidden Markov Model Based Optical Character Recognitionin the Presence of Deterministic Transformation. Pattern Recognition], 26(12):1813--1826,February 1993. [baum68a] L. E. Baum. Statistical Inference for Probabilistic Functions of Finite State MarkovChains. Annals of Mathematics and Statistics], 37:1554--1563, 1968. [bippus97a] R. Bippus. 1-Dimensional and Pseudo 2-Dimensional HMMs for the Recognition ofGerman Literal Amounts. In Fourth International Conference on Document Analysis andRecognition (ICDAR'97), volume 2, pages 487--490, Ulm, Germany, Aug. 1997. [chen94a] Mou-Yen Chen, Amlan Kundu, and Jian Zhou. Off-Line Handwritten Word RecognitionUsing a Hidden Markov Model Type Stochastic Network. IEEE Transactions on PatternRecognition and Machine Intelligence], 16(5):481--497, 1994. [choisy00b] C. Choisy, A. Belaïd. Analytic word recognition without segmentation based onMarkov random fields. In Seventh International Workshop on Frontiers in HandwritingRecognition (IWFHR-VII)], The Netherlands, Sept. 2000. [elyacoubi99a] R. Sabourin, A. El-Yacoubi, M. Gilloux, and C. Y. Suen. An HMM-BasedApproach for Off-Line Unconstrained Handwritten Word Modeling and Recognition. IEEETransactions on Pattern Recognition and Machine Intelligence, 21(8):752--760, Aug. 1999. [gader97a] M. Mohamed, P. D. Gader, and J.-H. Chiang. Handwritten word recognition withcharacter and inter-character neural networks. IEEE Transactions on Systems, Man, andCybernetics], 27:158--164, 2 1997. [gilloux95a] M. Gilloux, B. Lemarié, and M. Leroux.. A Hybrid Radial Basis FunctionNetwork/Hidden Markov Model Handwritten Word Recognition System. In Third InternationalConference on Document Analysis and Recognition (ICDAR'95)], pages 394--397, Montreal, 1995. [negi95a] Arun. Agarwal Atul Negi and K. S. Swaroop. A Correspondence Based Approach toSegmentation of Cursive Words. In Third International Conference on Document Analysis andRecognition (ICDAR'95)], Montreal, 1995. [park96a] H. S. Park, and S. W. Lee. An HMMRF-Based Statistical Approach for Off-lineHandwritten Character Recognition. In IEEE Proceedings of ICPR'96], volume 2, pages 320--324,1996. [rabiner89a] L. R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications inSpeech Recognition. Proceedings of the IEEE, 77(2), February 1989. [ronee01a] S. Uchida, M. A. Ronee, and H. Sakoe. Handwritten character recognition usingpiecewise linear two-dimensional warping. In Sixth International Conference on DocumentAnalysis and Recognition (ICDAR'2001)], pages 39--43, Seattle, Washington, USA, Sept. 10-132001. [saon97a] G. Saon, and A. Belaid. Off-line Handwritten Word Recognition Using A Mixed HMM-MRF Approach. In Fourth International Conference on Document Analysis and Recognition(ICDAR'97)], volume 1, pages 118--122, Ulm, Germany, Aug. 1997. [senior98a] A. W. Senior, and A. J. Robinson. An off-line cursive handwriting recognition system.IEEE Transactions on Pattern Recognition and Machine Intelligence], 20(3):309--321, March 1998. [simon94a] J. C. Simon, O. Baret, and N. Gorski. A System for the Recognition of HandwrittenLiteral amounts of checks. In Internal Association for Pattern Recognition Workshop on DocumentAnalysis System (DAS'94), Kaiserlautern, Germany], pages 135--155, September 1994. [shridhar97a] F. Kimura, M. Shridhar, Gilles Houle. Handwritten word recognition using lexiconfree and lexicon directed word recognition algorithms. In Third International Conference onDocument Analysis and Recognition (ICDAR'95), pages 82--85, Montreal, 1995. [uchida99a] S. Uchida, and H. Sakoe. An Efficient Two-Dimensional Warping Algorithm. IEICETransactions on Information and Systems, E82-D(3):693--700, March 1999.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analytic Word Recognition without Segmentation Based on Markov Random Fields

In this paper, a method for analytic handwritten word recognition based on causal Markov random fields is described. The words models are HMMs where each state corresponds to a letter; each letter is modelled by a NSHP-HMM (Markov field). Global models are build dynamically, and used for recognition and learning with the Baum-Welch algorithm. Learning of letter and word models is made using the...

متن کامل

Optical Character Recognition for Cursive Handwriting

ÐIn this paper, a new analytic scheme, which uses a sequence of segmentation and recognition algorithms, is proposed for offline cursive handwriting recognition problem. First, some global parameters, such as slant angle, baselines, and stroke width and height are estimated. Second, a segmentation method finds character segmentation paths by combining gray scale and binary information. Third, H...

متن کامل

An Analytic Scheme for Online Handwritten Bangla Cursive Word Recognition

In this article, we describe a prototype system for recognition of online handwritten cursive words of Bangla, a script used by more than 200 million people of India and Bangladesh, two neighboring countries of Asia. To the best of our knowledge, in the literature, there does not exist any work on recognition of such online Bangla cursive words. Here, we propose an analytic recognition approach...

متن کامل

Speech segmentation without speech recognition

In this paper, we presented a semantic speech segmentation approach, in particular sentence segmentation, without speech recognition. In order to get phoneme level information without word recognition information, a novel vowel/consonant/pause (V/C/P) classification is proposed. An adaptive pause detection method is also presented to adapt to various background and environment. Three feature se...

متن کامل

Automatic segmentation of English words using phonotactic and syllable information

It is difficult to demonstrate the effectiveness of prosodic features in automatic word recognition. Recently, we applied the suprasegmental concept and proposed an extra layer of acoustic modeling with syllables. Nevertheless, there is a mismatch between the syllable and the word units and that makes subsequent steps after acoustic modeling difficult. In this study, we explore English word seg...

متن کامل

Zone-based Keyword Spotting in Bangla and Devanagari Documents

In this paper we present a word spotting system in text lines for offline Indic scripts such as Bangla (Bengali) and Devanagari. Recently, it was shown that zone-wise recognition method improves the word recognition performance than conventional full word recognition system in Indic scripts [29]. Inspired with this idea we consider the zone segmentation approach and use middle zone information ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000