Word Spotting in Chinese Document Images without Layout Analysis

نویسندگان

  • Yue Lu
  • Chew Lim Tan
چکیده

An approach to searching user-specified words/phrases in Chinese document images, without the requirements of layout analysis, is proposed in this paper. Bounding boxes of Chinese character images are first determined using connected component analysis. Next, a suitable character from the user-specified word/phrase is chosen as the initial character to search for a matching candidate in the document. Once a matched candidate is found, its adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word/phrase, subject to the constraints of positional relation and size similarity. The character matching is done in two stages. The coarse matching is carried out based on the stroke density features. A weighted Hausdorff distance(WHD) is proposed for the second matching phase. Experimental results show that the proposed method can effectively search the user-specified Chinese word/phrase from horizontal or vertical text lines of document images.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An OCR Free Method for Word Spotting in Printed Documents: the Evaluation of Different Feature Sets

An OCR free word spotting method is developed and evaluated under a strong experimental protocol. Different feature sets are evaluated under the same experimental conditions. In addition, a tuning process in the document segmentation step is proposed which provides a significant reduction in terms of processing time. For this purpose, a complete OCRfree method for word spotting in printed docum...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Local Binary Pattern for Word Spotting in Handwritten Historical Document

Digital libraries store images which can be highly degraded and to index this kind of images we resort to word spotting as our information retrieval system. Information retrieval for handwritten document images is more challenging due to the difficulties in complex layout analysis, large variations of writing styles, and degradation or low quality of historical manuscripts. This paper presents ...

متن کامل

Text Block Recognition in Multi-Oriented Handwritten Documents

Automatic detection of text blocks is an important step before applying OCR or word-spotting techniques to document images. Our approach focusses on handwritten (historical) documents and uses the Gabor Transformation to facilitate this task. Apart from the main text, which often consists of rectangular shaped text blocks, marginalia are of special interest here. These areas are generally uncon...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002