OCR for printed Kannada text to Machine editable format using Database approach
نویسنده
چکیده
This paper describes an Optical Character Recognition (OCR) system for printed text documents in Kannada, a South Indian language. The proposed OCR system for the recognition of printed Kannada text, which can handle all types of Kannada characters. The system first extracts image of Kannada scripts, then from the image to line segmentation then segments the words into sub-character level pieces. For character recognition we have used database approach. The level of accuracy reached to 100%. Key-words: Optical Character Recognition, Segmentation, Kannada Scripts
منابع مشابه
A font and size-independent OCR system for printed Kannada documents using support vector machines
This paper describes an OCR system for printed text documents in Kannada, a South Indian language. The input to the system would be the scanned image of a page of text and the output is a machine editable file compatible with most typesetting software. The system first extracts words from the document image and then segments the words into sub-character level pieces. The segmentation algorithm ...
متن کاملOCR for Handwritten Kannada Language Script
The optical character recognition (OCR) is the process of converting textual scanned image into a computer editable format. The proposed OCR system is for complex handwritten Kannada characters. One of the major challenges faced by Kannada OCR system is recognition of handwritten text from an image. The input text image is subjected to preprocessing and then converted into binary image. Segment...
متن کاملOptical Character Recognition: A Review
The Optical Character Recognition is the electronic conversion of image of typewritten or printed text into machine-encoded text. It is common method of digitizing printed texts. Advantages being easy storage, edit ability, searching, etc. OCR is a field of research in pattern recognition, artificial intelligence and computer vision. In previous decades it has gain more importance due to feasib...
متن کاملRecognition of Text Image Using Multilayer Perceptron
—The biggest challenge in the field of image processing is to recognize documents both in printed and handwritten format. Optical Character Recognition (OCR) is a type of document image analysis where scanned digital image that contains either machine printed or handwritten script input into an OCR software engine and translating it into an editable machine readable digital text format. A Neura...
متن کاملOcr-optical Character Recognition
Optical Character Recognition or OCR is the electronic translation of handwritten, typewritten or printed text into machine translated images. It is widely used to recognize and search text from electronic documents or to publish the text on a website. OCR is the machine replication of human reading and has been the subject of intensive research for more than three decades. OCR can be described...
متن کامل