Arabic Character Recognition
نویسندگان
چکیده
Although optical character recognition of printed texts has been a focus of research for the last few decades, Arabic printed text, being cursive, still poses a challenge. The challenge is twofold: segmenting words into letters and identifying individual letters. We describe a method that combines the two tasks, using multiple grids of SIFT descriptors as features. To construct a classifier, we do not use a large training set of images with corresponding ground truth, a process usually done to construct a classifier, but, rather, an image containing all possible symbols is created and a classifier is constructed by extracting the features of each symbol. To recognize the text inside an image, the image is split into “pieces of Arabic words”, and each piece is scanned with increasing window sizes. Segmentation points are set where the classifier achieves maximal confidence. Using the fact that Arabic has four forms of letters (isolated, initial, medial and final), we narrow the search space based on the location inside the piece. The performance of the proposed method, when applied to printed texts and computer fonts of different sizes, was evaluated on two independent benchmarks, PATS and APTI. Our algorithm outperformed that of the creator of PATS on five out of eight fonts, achieving character correctness of 98.87%–100%. On the APTI dataset, ours was competitive or better that the competition.
منابع مشابه
Lexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR
Arabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition) and lexicon reduction for Arabic Script and its derivative languages. The objective of the proposed method is to overcome the large dataset Urdu and similar scripts by using...
متن کاملArabic character recognition in a multi-processing environment
Many Arabic character recognition systems have been proposed since the early eighties. Most systems reported high recognition rates, however, they overlooked a very i m portant factor in the process; the speed factor. I n this paper, a parallel Arabic character recognition sys tem is introduced. T h e goal of the sys tem is t o achieve both full accuracy and high speed. Th i s has been accompli...
متن کاملArabic Character Recognition using Approximate Stroke Sequence
Arabic character recognition of handwriting is addressed. A novel approach for the Arabic Character Recognition is presented based on statistical analysis of a typical Arabic text is presented. Results showed that the sub-word in Arabic language is the basic pictorial block rather than the word. The method of approximate stroke sequence is applied for the recognition of some Arabic characters i...
متن کاملWord-level recognition of multifont Arabic text using a feature vector matching approach
Many text recognition systems recognize text imagery at the character level and assemble words from the recognized characters. An alternative approach is to recognize text imagery at the word level, without analyzing individual characters. This approach avoids the problem of individual character segmentation, and can overcome local errors in character recognition. A word-level recognition syste...
متن کاملAn Arabic optical character recognition system using recognition-based segmentation
Optical character recognition (OCR) systems improve human}machine interaction and are widely used in many areas. The recognition of cursive scripts is a di$cult task as their segmentation su!ers from serious problems. This paper proposes an Arabic OCR system, which uses a recognition-based segmentation technique to overcome the classical segmentation problems. A newly developed Arabic word segm...
متن کاملEvaluation Approach of Arabic Character Recognition
This paper proposes and contributes towards designing a complete system for off-line Arabic character recognition. The proposed system is specifically meant for Arabic handwriting recognition, but it equally works for the typed character recognition. It has various phases including preprocessing and segmentation. It also includes thinning phase and finds vertical and horizontal projection profi...
متن کامل