An improved offline handwritten character segmentation algorithm for Bangla script
نویسندگان
چکیده
Effective segmentation of offline handwritten word images of unconstrained handwritten Bangla script is a challenging problem in Optical Character Recognition (OCR) application. Presence of a continuous horizontal line called ‘Matra’ is an important feature of this script. However, in unconstrained cursive handwriting, Matra can be wavy or discontinuous, makes the problem of segmentation difficult. The current work designs a novel technique for identification of potential segmentation points on the Matra for isolating constituent characters from the word image of Bangla script. In the first stage, 8-neighbour Connected Component Labelling (CCL) algorithm is applied to identify connected sub-parts of the word images. These connected components are then classified into either of the two classes, namely ‘Segment further’ (SF) and ‘Do Not Segment’ (DNS). In the second stage, the trivial SF and DNS components are separated. Then the remaining components are classified into SF and DNS using a Multi-Layer Perceptron (MLP) based classifier. Finally, fuzzy segmentation features are used over the SF components to identify potential segmentation points on the detected fuzzy Matra region for extraction of constituent characters or character sub-parts from the overall word images. The present technique has been successfully applied on 500 handwritten Bangla word images and it is also found that the technique performs better than our earlier character segmentation techniques [1-2].
منابع مشابه
Handwritten Segmentation in Bangla Script: A Review of Offline Techniques
Offline handwritten segmentation in Bangla is an interesting area of research as Segmentation has long been one of the most critical areas of optical character recognition process. Through this operation, an image of a sequence of characters, which may be connected in some cases, is decomposed into sub-images of individual alphabetic symbols. In this paper, segmentation of cursive handwritten s...
متن کاملWord Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images
In this paper, a novel approach for word extraction and character segmentation from the handwritten Bangla document images is reported. At first, a modified Run Length Smoothing Algorithm (RLSA), called Spiral Run Length Smearing Algorithm (SRLSA), is applied for the extraction of words from the text lines of unconstrained handwritten Bangla document images. This technique has helped to overcom...
متن کاملA Script Independent Technique for Extraction of Characters from Handwritten Word Images
A script independent character segmentation from word images technique has been reported here. Word to character segmentation is an important preprocessing step of optical character recognition process. But in case of handwritten text, presence of touching characters decreases the accuracy of the technique of the segmentation of the characters from the word. In this paper, segmentation of handw...
متن کاملA Survey on Script Segmentation for Bangla OCR
Script segmentation is an important primary task for any Optical Character Recognition (OCR) software. Especially, in case of off-line OCR for printed character, it has more importance. Through script segmentation a big image of some written document is fragmented into a number of small pieces which are then used for pattern matching to determine the expected sequence of characters. In the impl...
متن کاملSegmentation of Offline Handwritten Bengali Script
Character segmentation has long been one of the most critical areas of optical character recognition process. Through this operation, an image of a sequence of characters, which may be connected in some cases, is decomposed into sub-images of individual alphabetic symbols. In this paper, segmentation of cursive handwritten script of world’s fourth popular language, Bengali, is considered. Unlik...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011