In Codice Ratio: Scalable Transcription of Historical Handwritten Documents

نویسندگان

  • Serena Ammirati
  • Donatella Firmani
  • Marco Maiorino
  • Paolo Merialdo
  • Elena Nieddu
  • Andrea Rossi
چکیده

Huge amounts of handwritten historical documents are being published by digital libraries world wide. However, for these raw digital images to be really useful, they need to be annotated with informative content. State-of-the-art Handwritten Text Recognition (HTR) approaches require an impressive training effort by expert paleographers. Our contribution is a scalable, end-to-end transcription work-flow – that we call In Codice Ratio – based on fine-grain segmentation of text elements into characters and symbols, with limited training effort. We provide a preliminary evaluation of In Codice Ratio over a corpus of letters by pope Honorii III, stored in the Vatican Secret Archive.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Knowledge Discovery from the Vatican Secret Archives. In Codice Ratio – Episode 1: Machine Transcription of the Manuscripts

In Codice Ratio is a research project to study tools and techniques for analyzing the contents of historical documents conserved in the Vatican Secret Archives (VSA). In this paper, we present our e‚orts to develop a system to support the transcription of medieval manuscripts. Œe goal is to provide paleographers with a tool to reduce their e‚orts in transcribing large volumes, as those stored i...

متن کامل

Exploiting Collection Level for Improving Assisted Handwritten Words Transcription of Historical Documents

Transcription of handwritten words in historical documents is still a difficult task. When processing huge amount of pages, document centered approaches are limited by the trade-off between automatic recognition errors and the tedious aspect of human user annotation work. In this article, we investigate the use of inter page dependencies to overcome those limitations. For this, we propose a new...

متن کامل

A Multimodal Approach to Dictation of Handwritten Historical Documents

Handwritten Text Recognition is a problem that has gained attention in the last years due to the interest in the transcription of historical documents. Handwritten Text Recognition employs models that are similar to those employed in Automatic Speech Recognition (Hidden Markov Models and n-grams). Dictation of the contents of the document is an alternative to text recognition. In this work, we ...

متن کامل

Handwritten Text Recognition for Historical Documents

The amount of digitized legacy documents has been rising dramatically over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed into a textual electronic format (such as ASCII or PDF) that would provide historians and other researchers new ways of indexing, consulting and que...

متن کامل

Evaluation of Handwriting Recognition Systems for Application to Historical Records

In the last decade, significant, largely-governmental funding has been applied to the automatic transcription of handwritten documents. Uses for this kind of technology are somewhat limited given that the numbers of handwritten documents are on the decline. However, certain types of handwritten historical records can be crucial for genealogical research in that they identify key vital facts. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017