Identification of Arabic/French Handwritten/Printed Words using GMM-Based System
نویسندگان
چکیده
The discrimination between languages is one of the first steps in the problem of automatic documents text recognition. In many documents, such as bank checks and application forms, printed and handwritten texts are mixed. In this paper, an automatic identification system of Arabic and French words in both handwritten and printed script based on Gaussian Mixture Models (GMMs) was presented. A fixed-length sliding window was used for the feature extraction. Experiments using some parts of the freely available AHTID/MW, APTI and RIMES databases show a remarkable performance of the proposed approach. RÉSUMÉ. La discrimination entre les langues est l'une des premières étapes dans le problème de reconnaissance automatique des documents de textes. Dans de nombreux documents, tels que les chèques bancaires et les formulaires, les textes imprimés et manuscrits sont mélangés. Dans cet article, nous proposons un système d'identification automatique des mots arabes et français dans les deux formes: manuscrite et imprimée. Ce système est basé sur les modèles de mélanges gaussiens (GMMs). Pour l'extraction des caractéristiques, nous utilisons une fenêtre glissante de longueur fixe. Des expérimentations utilisant quelques parties des bases gratuitement disponibles AHTID/MW, APTI et RIMES montrent une performance remarquable de l'approche proposée.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملAutomatic Script and Type Identification in Bi-lingual Forms
In this paper we have developed a system that can automatically discriminate between machine-printed and handwritten words in structured bi-lingual (Arabic and French) form document layout. Our system has been applied in the context of Tunisian National Health Insurance Fund for medical care costs refund with encouraging results. In the used forms, handwritten data usually touch or cross the pr...
متن کاملRecognition of Off-Line Handwritten Arabic Words Using Hidden Markov Model Approach
Hidden Markov Models (HMM) have been used with some success in recognizing printed Arabic words. In this paper, a complete scheme for totally unconstrained Arabic handwritten word recognition based on a Model discriminant HMM is presented. A complete system able to classify Arabic-Handwritten words of one hundred different writers is proposed and discussed. The system first attempts to remove s...
متن کاملSpotting Words in Latin, Devanagari and Arabic Scripts
A system for spotting words in scanned document images in three scripts, Devanagari, Arabic and Latin is described. Three main components of the system are a word segmenter, a shape based matcher for words and a search interface. The user gives a query which can be either a word image or text. The candidate words that are searched in the documents are retrieved and ranked, where the ranking cri...
متن کاملSegmentation of Overlapped Handwritten Arabic Sub-Words
Arabic script is cursive in both handwritten and printed form. Segmentation of Arabic scriptespecially handwrittenis a very challenging task. Many difficulties arise due to the inherent characteristics of Arabic writing such as the overlapping of Arabic sub-words wherein the sub-words share the same vertical space, and vertical ligatures wherein characters are stacked upon each other in a word....
متن کامل