Reliable OCR solution for digital content re-mastering

نویسنده

  • Xiaofan Lin
چکیده

This paper addresses the system’s aspects of OCR solutions in the context of digital content re-mastering. It analyzes the unique requirements and challenges to implement a reliable OCR system in a high-volume and unattended environment. A new reliability metric is proposed and a practical solution based on the combination of multiple commercial OCR engines is introduced. Experimental results show that the combination system is both much more accurate and more reliable when compared with individual engines, thus it can fully satisfy the need of digital content re-mastering applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic document navigation for digital content remastering

digital content re-mastering, document structure analysis, print on demand, content linking, OCR This paper presents a novel method of automatically adding navigation capabilities to re-mastered electronic books. We first analyze the need for a generic and robust system to automatically construct navigation links into re-mastered books. We then introduce the core algorithm based on text matchin...

متن کامل

DRR Research beyond COTS OCR Software: A Survey

After decades of research, Optical Character Recognition (OCR) has entered into a relatively mature stage. Commercial off-the-shelf (COTS) OCR software packages have become powerful tools in Document Recognition and Retrieval (DRR) applications. One question naturally arises: What areas are left for new DRR research beyond COTS OCR software? There are many discussions around it in recent confer...

متن کامل

OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set

Since the dawn of the computing era, information has been represented digitally so that it can be processed by electronic computers. Paper books and documents were abundant and widely being published at that time; and hence, there was a need to convert them into digital format. OCR, short for Optical Character Recognition was conceived to translate paper-based books into digital e-books. Regret...

متن کامل

Digital Storytelling in a Foreign Language Classroom of Higher Educational Establishments

The conceptual idea of the paper is that the use of digital biographical narratives in a foreign language classroom creates favorable conditions for the harmonious development and creative, cognitive, communicative and technological skills of students. The paper deals with the methods of teaching students to create digital biographical narratives about the life of outstanding personalities whic...

متن کامل

Boosting OCR Accuracy Using Crowdsourcing

Book digitizing is an important work in preserving ancient heritages. However, digitizing books contains a series of labor-intensive works, and one of them is to verify optical character recognition (OCR) outcomes. In this paper, we propose a crowdsourceable OCR verification method. Using our method, content holders are able to leverage the power of crowds to complete verification tasks and avo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002