نتایج جستجو برای: historical picture

تعداد نتایج: 190184  

2006
Maria Clara Paixão de Sousa Thorsten Trippel

This paper describes the restructuring process of a large corpus of historical documents and the system architecture that is used for accessing it. The initial challenge of this process was to get the most out of existing material, normalizing the legacy markup and harvesting the inherent information using widely available standards. This resulted in a conceptual and technical restructuring of ...

Journal: :Pattern Recognition Letters 2005
Itay Bar Yosef

In this paper, we describe an input sensitive thresholding algorithm for ancient Hebrew calligraphy documents. Usually, historical document images are of poor quality since the documents have degraded over time due to storage conditions. However, the distribution of noise in one document is not uniform and the characters quality may vary. We develop tools to identify noisy characters and apply ...

2016
Dan Garrette Hannah Alpert-Abrams

Historical documents frequently exhibit extensive orthographic variation, including archaic spellings and obsolete shorthand. OCR tools typically seek to produce so-called diplomatic transcriptions that preserve these variants, but many end tasks require transcriptions with normalized orthography. In this paper, we present a novel joint transcription model that learns, unsupervised, a probabili...

2012
Eva Pettersson Beáta Megyesi Joakim Nivre

Even though NLP tools are widely used for contemporary text today, there is a lack of tools that can handle historical documents. Such tools could greatly facilitate the work of researchers dealing with large volumes of historical texts. In this paper we propose a method for extracting verbs and their complements from historical Swedish text, using NLP tools and dictionaries developed for conte...

2003
Michael Droettboom Karl MacMillan Ichiro Fujinaga

This paper describes the Gamera framework for building custom document recognition systems. This open-source system is designed to support the testand-refine development cycle: an important style for developing recognition systems that work with difficult historical documents, since the solutions are often non-obvious. This paper explains the overall architecture of the system, in addition to d...

2015
Rafael C. Carrasco Isabel Martínez-Sempere Enrique Mollá-Gandía Felipe Sánchez-Martínez Gustavo Candela Romero Maria Pilar Escobar Esteban

The BVC section of the impact-es diachronic corpus of historical Spanish compiles 86 books —containing approximately 2 million words. About 27% of the words —providing a representative coverage of the most frequent word forms— have been annotated with their lemma, part of speech, and modern equivalent following the Text Encoding Initiative guidelines. We describe how this type of annotation can...

2010
Sai-Ming Li Mohammad Mahdian R. Preston McAfee

The standard business model in the sponsored search marketplace is to sell click-throughs to the advertisers. This involves running an auction that allocates advertisement opportunities based on the value the advertiser is willing to pay per click, times the click-through rate of the advertiser. The click-through rate of an advertiser is the probability that if their ad is shown, it would be cl...

2011
Sokratis Vavilis Ergina Kavallieratou Roberto Paredes Kostas Sotiropoulos

In this chapter, a binarization technique specifically designed for historical document images is presented. Existing binarization techniques focus either on finding an appropriate global threshold or adapting a local threshold for each area in order to remove smear, strains, uneven illumination etc. Here, a hybrid approach is presented that first applies a global thresholding technique and, th...

2016
Mathias Coeckelbergs Seth van Hooland

Providing useful and efficient semantic annotations is a major challenge for knowledge design of any body of text, especially historical documents. In this article, we propose Topic Modeling as an important first step to gather semantic information beyond the lexicon which can be added as annotations in the SHEBANQ. By laying out a case study, we discuss both noise and structure found in compar...

Journal: :Pattern Recognition Letters 2014
Raid Saabni Abedelkadir Asi Jihad El-Sana

0167-8655/$ see front matter 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.07.007 ⇑ Corresponding author at: Department of Computer Science, Triangle Research & Development Center, Kafr Qarea, Israel. Fax: +972 4 6356168. E-mail addresses: [email protected] (R. Saabni), [email protected] (A. Asi), [email protected] (J. El-Sana). 1 These authors contribut...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید