An Efficient Method for Urdu Language Text Search in Image Based Urdu Text
نویسندگان
چکیده
This paper describes an efficient method for Urdu text search in computer generated and handwritten scanned images. An efficient text search technology is necessary because of increasing handled document every day. This method is unique and simple in the sense that no features are extracted. The proposed method is script independent. The input image is directly matched with a set of prototype characters representing each possible class. The distance between each input image and each prototype character is computed, and the character is assigned to the class of the prototype giving the best match. Experimental results show 100 % accuracy for 4, 5-character ligatures, 87 % for 3-character ligature and 78 % for 2-character ligatures.
منابع مشابه
A Rule based Stemming Method for Multilingual Urdu Text
Urdu is a national language of Pakistan and spoken more than 200 million people use it as a verbal and written communication. There exists a large amount of unstructured Urdu textual data in the world; by applying data mining techniques useful information can be achieved. However it seriously lacks processing capabilities to develop innovative systems based on Urdu language. In this paper, auth...
متن کاملLanguage Engineering System for Automatic Conversion of English Cyber Data into Urdu Websites
English is one of the most widely spoken languages in the world these days. Most of the commercial websites are also being designed in English Language. Modern software engineering trends supports better interfaces for effective Human Computer Interaction (HCI). One of the major HCI requirements is to provide data in human readable format. All people can not get benefits of cyber information wh...
متن کاملUnconstrained OCR for Urdu using Deep CNN-RNN Hybrid Networks
Building robust text recognition systems for languages with cursive scripts like Urdu has always been challenging. Intricacies of the script and the absence of ample annotated data further act as adversaries to this task. We demonstrate the effectiveness of an end-to-end trainable hybrid CNN-RNN architecture in recognizing Urdu text from printed documents, typically known as Urdu OCR. The solut...
متن کاملAGHAZ: An Expert System Based approach for the Translation of English to Urdu
–Machine Translation (MT ) of English text to its Urdu equivalent is a difficult challenge. Lot of attempts has been made, but a few limited solutions are provided till now. We present a direct approach, using an expert system to translate English text into its equivalent Urdu, using The Unicode Standard, Version 4.0 (ISBN 0-321-18578-1) Range: 0600–06FF. The expert system works with a knowledg...
متن کاملUCOM offline dataset-an urdu handwritten dataset generation
A benchmark database for character recognition is an essential part for efficient and robust development. Unfortunately, there is no comprehensive handwritten dataset for Urdu language that would be used to compare the state of the art techniques in the field of optical character recognition. In this paper, we present a new and publically available dataset comprising 600 pages of handwritten Ur...
متن کامل