ocr

Linguistic Error Correction Of Japanese Sentences

1980

Tsutomu Kawada Shin'ya Amano Kunio Sakai

This paper describes a newly developed linguistic error correction system, which can correct errors and rejections of Japanese sentences by using linguistic knowledge. Conventional optical character readers (OCR) need human assistance to correct their recognition errors and rejections. An operator must teach the OCR correct answers whenever an illegible character pattern occurs. If this error c...

متن کامل

Enhancing Image-based Arabic Document Translation Using a Noisy Channel Correction Model

2007

Yi Chang Ying Zhang Stephan Vogel Jie Yang

An image-based document translation system consists of several components, among which OCR (Optical Character Recognition) plays an important role. However, existing OCR software is not robust against environmental variations. Furthermore, OCR errors are often propagated into the translation component and cause, causing poor end-to-end performance. In this paper, we propose an imagebased docume...

متن کامل

Text Categorization of Low Quality Images

1995

David J Ittner David D Lewis David D Ahn

Categorization of text images into content oriented classes would be a useful capability in a variety of document handling systems Many methods can be used to cat egorize texts once their words are known but OCR can garble a large proportion of words particularly when low quality images are used Despite this we show for one data set that fax quality images can be cat egorized with nearly the sa...

متن کامل

Text Area Identification in Web Images

2004

Stavros J. Perantonis Basilios Gatos Vassilios Maragos Vangelis Karkaletsis Georgios Petasis

With the explosive growth of the World Wide Web, millions of documents are published and accessed on-line. Statistics show that a significant part of Web text information is encoded in Web images. Since Web images have special characteristics that sometimes distinguish them from other types of images, commercial OCR products often fail to recognize Web images due to their special characteristic...

متن کامل

English/Arabic Cross Language Information Retrieval (CLIR) for Arabic OCR-Degraded Text

2009

Tarek A. Elghazaly

In this paper, a novel for Query Translation and Expansion for enabling English/Arabic CLIR for both normal and OCR-Degraded Arabic Text model has been proposed, implemented, and tested. First, an English/Arabic Word Collocations Dictionary has been established plus reproducing three English/Arabic Single Words Dictionaries. Second, a modern Arabic Corpus has been built. Third, a model for simu...

متن کامل

The Bible , Truth , and Multilingual OCR

1999

Tapas Kanungo

Multilingual OCR has emerged as an important information technology, thanks to the increasing need for cross-language information access. While many research groups and companies have developed OCR algorithms for various languages, it is diicult to compare the performance of these OCR algorithms across languages. This diiculty arises because most evaluation methodologies rely on the use of a do...

متن کامل

Representing OCRed documents in HTML

1997

Tao Hong Sargur N. Srihari

OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in reading and understanding if they do not refer to the original image representation. As demonstrated in this paper, a hybrid document which combines symbolic representation and image representation may relieve the problem. If we r...

متن کامل

Convergence reduces ocular counterroll (OCR) during static roll-tilt

Journal: :Vision Research 2004

D. Ooi E. D. Cornell I. S. Curthoys A. M. Burgess H. G. MacDougall

When humans are roll-tilted around the naso-occipital axis, both eyes roll or tort in the opposite direction to roll-tilt, a phenomenon known as ocular counterroll (OCR). While the magnitude of OCR is primarily determined by vestibular, somatosensory, and proprioceptive input, direction of gaze also plays a major role. The aim of this study was to measure the interaction between some of these f...

متن کامل

An Enhanced Arabic OCR Degraded Text Retrieval Model

2013

Mostafa Ezzat Tarek Elghazaly Mervat Gheith

This paper provides a new model enhancing the Arabic OCR degraded text retrieval effectiveness. The proposed model based on simulating the Arabic OCR recognition mistakes on a word based approach. Then the model expands the user search query using the expected OCR errors. The resulting expanded search query gives higher precision and recall in searching Arabic OCR-Degraded text rather than the ...

متن کامل

A Survey on Script Segmentation for Bangla OCR

2007

Arif Billah Al-Mahmud Mumit Khan

Script segmentation is an important primary task for any Optical Character Recognition (OCR) software. Especially, in case of off-line OCR for printed character, it has more importance. Through script segmentation a big image of some written document is fragmented into a number of small pieces which are then used for pattern matching to determine the expected sequence of characters. In the impl...

متن کامل