Measuring Search Retrieval Accuracy of Uncorrected OCR: Findings from the Harvard-Radcliffe Online Historical Reference Shelf Digitization Project

ثبت نشده
چکیده

This report presents the findings of an investigation to evaluate the conditions for search retrieval successes and failures when using uncorrected OCR for indexing. The purpose of the study was to assess whether low-cost, high-production techniques for text conversion were adequate to produce digital reproductions of consistent quality and usability. We sought to identify attributes of the original material or the OCR-produced text that could predict when additional, costly processes (OCR correction or keying) would be needed to meet retrieval requirements for text digitization projects.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Attributing Authorship in the Noisy Digitized Correspondence of Jacob and Wilhelm Grimm

This article presents the results of a multidisciplinary project aimed at better understanding the impact of different digitization strategies in computational text analysis. More specifically, it describes an effort to automatically discern the authorship of Jacob and Wilhelm Grimm in a body of uncorrected correspondence processed by HTR (Handwritten Text Recognition) and OCR (Optical Characte...

متن کامل

Retrieval of Spelling Variants in Nonstandard Texts – Automated Support and Visualization

This article describes ongoing research in the RSNSR (Regelbasierte Suche in Textdatenbanken mit nichtstandardisierter Rechtschreibung, “Rule-based search in text databases with nonstandard orthography”) project. The focus of this project is making historical text documents digitally available; consequently, it examines the challenges for digitization procedures and subsequent retrieval operati...

متن کامل

مطالعۀ سیر تکاملی حوزۀ «خدمات و منابع مرجع» با استفاده از طیف‎سنجی سال انتشار مآخذ

Purpose: To identify major events in the development of Reference Services literature. Methodology: Reference Publication Year Spectroscopy (RPYS) technique is used. Initial data was obtained from the Scopus by scientometrics method. A comprehensive search strategy led to the retrieval of 5007 records. RPYS software was used to revise data. Excel application was used for visualization of findi...

متن کامل

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...

متن کامل

Estimating Digitization Costs in Digital Libraries Using DiCoMo

The estimate of digitization costs is a very difficult task. It is difficult to make exact predictions due to the great quantity of unknown factors. However, digitization projects need to have a precise idea of the economic costs and the times involved in the development of their contents. The common practice when we start digitizing a new collection is to set a schedule, and a firm commitment ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014