A word spotting method for Farsi machine-printed document images
نویسندگان
چکیده
In this paper, a word spotting approach for Farsi printed document images has been presented. The main idea of the paper is the font recognition of Farsi document images and query word modification according to the document image’s font before searching. This operation increases the similarity between the query word image and its instances in the document image; therefore, the performance of the word spotting system increases. In the proposed word spotting approach, after the query word modification, the query word image rectangle is searched in the text lines of the document image using XNOR similarity measurement. In order to increase the recall rate, we considered an almost low value as an acceptance/rejection threshold (δ) and in order to increase precision rate, we used some other features, e.g., number of holes, ascenders, descenders, and dots. With multilevel matching and considering the mentioned features, the problem of justifying the operation (aligning the text to both the left and right) that occurs during the writing of Farsi documents has been solved. This approach was applied on a computer-made dataset consisting of 440 Farsi printed document images, and a precision rate of 97.5% at a recall rate of 92.1% was obtained. Moreover, when applying this approach on a dataset consisting of 224 Farsi scanned document images, a precision rate of 87.6% at recall rate of 79.3% was obtained.
منابع مشابه
Script Independent Word Spotting in Multilingual Documents
This paper describes a method for script independent word spotting in multilingual handwritten and machine printed documents. The system accepts a query in the form of text from the user and returns a ranked list of word images from document image corpus based on similarity with the query word. The system is divided into two main components. The first component known as Indexer, performs indexi...
متن کاملAn OCR Free Method for Word Spotting in Printed Documents: the Evaluation of Different Feature Sets
An OCR free word spotting method is developed and evaluated under a strong experimental protocol. Different feature sets are evaluated under the same experimental conditions. In addition, a tuning process in the document segmentation step is proposed which provides a significant reduction in terms of processing time. For this purpose, a complete OCRfree method for word spotting in printed docum...
متن کاملDocument Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملKeyword Spotting on Hangul Document Images Using Two-Level Image-to-Image Matching
A lot of printed documents and books has been published and saved as a form of images in digital libraries. Searching for a specified query word on document images is a challenging problem. The OCR software helps the images to be converted to the machine readable documents to search a full context [1]. Another approach [1, 2] is image-based one, in which both the document images and word inform...
متن کاملA classification-free word-spotting system
In this paper, a classification-free Word-Spotting system, appropriate for the retrieval of printed historical document images is proposed. The system skips many of the procedures of a common approach. It does not include segmentation, feature extraction or classification. Instead it treats the queries as compact shapes and uses image processing techniques in order to localize a query in the do...
متن کامل