In Codice Ratio: Scalable Transcription of Historical Handwritten Documents
نویسندگان
چکیده
Huge amounts of handwritten historical documents are being published by digital libraries world wide. However, for these raw digital images to be really useful, they need to be annotated with informative content. State-of-the-art Handwritten Text Recognition (HTR) approaches require an impressive training effort by expert paleographers. Our contribution is a scalable, end-to-end transcription work-flow – that we call In Codice Ratio – based on fine-grain segmentation of text elements into characters and symbols, with limited training effort. We provide a preliminary evaluation of In Codice Ratio over a corpus of letters by pope Honorii III, stored in the Vatican Secret Archive.
منابع مشابه
Towards Knowledge Discovery from the Vatican Secret Archives. In Codice Ratio – Episode 1: Machine Transcription of the Manuscripts
In Codice Ratio is a research project to study tools and techniques for analyzing the contents of historical documents conserved in the Vatican Secret Archives (VSA). In this paper, we present our eorts to develop a system to support the transcription of medieval manuscripts. e goal is to provide paleographers with a tool to reduce their eorts in transcribing large volumes, as those stored i...
متن کاملExploiting Collection Level for Improving Assisted Handwritten Words Transcription of Historical Documents
Transcription of handwritten words in historical documents is still a difficult task. When processing huge amount of pages, document centered approaches are limited by the trade-off between automatic recognition errors and the tedious aspect of human user annotation work. In this article, we investigate the use of inter page dependencies to overcome those limitations. For this, we propose a new...
متن کاملA Multimodal Approach to Dictation of Handwritten Historical Documents
Handwritten Text Recognition is a problem that has gained attention in the last years due to the interest in the transcription of historical documents. Handwritten Text Recognition employs models that are similar to those employed in Automatic Speech Recognition (Hidden Markov Models and n-grams). Dictation of the contents of the document is an alternative to text recognition. In this work, we ...
متن کاملHandwritten Text Recognition for Historical Documents
The amount of digitized legacy documents has been rising dramatically over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed into a textual electronic format (such as ASCII or PDF) that would provide historians and other researchers new ways of indexing, consulting and que...
متن کاملEvaluation of Handwriting Recognition Systems for Application to Historical Records
In the last decade, significant, largely-governmental funding has been applied to the automatic transcription of handwritten documents. Uses for this kind of technology are somewhat limited given that the numbers of handwritten documents are on the decline. However, certain types of handwritten historical records can be crucial for genealogical research in that they identify key vital facts. In...
متن کامل