نتایج جستجو برای: historical picture
تعداد نتایج: 190184 فیلتر نتایج به سال:
This paper describes the restructuring process of a large corpus of historical documents and the system architecture that is used for accessing it. The initial challenge of this process was to get the most out of existing material, normalizing the legacy markup and harvesting the inherent information using widely available standards. This resulted in a conceptual and technical restructuring of ...
In this paper, we describe an input sensitive thresholding algorithm for ancient Hebrew calligraphy documents. Usually, historical document images are of poor quality since the documents have degraded over time due to storage conditions. However, the distribution of noise in one document is not uniform and the characters quality may vary. We develop tools to identify noisy characters and apply ...
Historical documents frequently exhibit extensive orthographic variation, including archaic spellings and obsolete shorthand. OCR tools typically seek to produce so-called diplomatic transcriptions that preserve these variants, but many end tasks require transcriptions with normalized orthography. In this paper, we present a novel joint transcription model that learns, unsupervised, a probabili...
Even though NLP tools are widely used for contemporary text today, there is a lack of tools that can handle historical documents. Such tools could greatly facilitate the work of researchers dealing with large volumes of historical texts. In this paper we propose a method for extracting verbs and their complements from historical Swedish text, using NLP tools and dictionaries developed for conte...
This paper describes the Gamera framework for building custom document recognition systems. This open-source system is designed to support the testand-refine development cycle: an important style for developing recognition systems that work with difficult historical documents, since the solutions are often non-obvious. This paper explains the overall architecture of the system, in addition to d...
The BVC section of the impact-es diachronic corpus of historical Spanish compiles 86 books —containing approximately 2 million words. About 27% of the words —providing a representative coverage of the most frequent word forms— have been annotated with their lemma, part of speech, and modern equivalent following the Text Encoding Initiative guidelines. We describe how this type of annotation can...
The standard business model in the sponsored search marketplace is to sell click-throughs to the advertisers. This involves running an auction that allocates advertisement opportunities based on the value the advertiser is willing to pay per click, times the click-through rate of the advertiser. The click-through rate of an advertiser is the probability that if their ad is shown, it would be cl...
In this chapter, a binarization technique specifically designed for historical document images is presented. Existing binarization techniques focus either on finding an appropriate global threshold or adapting a local threshold for each area in order to remove smear, strains, uneven illumination etc. Here, a hybrid approach is presented that first applies a global thresholding technique and, th...
Providing useful and efficient semantic annotations is a major challenge for knowledge design of any body of text, especially historical documents. In this article, we propose Topic Modeling as an important first step to gather semantic information beyond the lexicon which can be added as annotations in the SHEBANQ. By laying out a case study, we discuss both noise and structure found in compar...
0167-8655/$ see front matter 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.07.007 ⇑ Corresponding author at: Department of Computer Science, Triangle Research & Development Center, Kafr Qarea, Israel. Fax: +972 4 6356168. E-mail addresses: [email protected] (R. Saabni), [email protected] (A. Asi), [email protected] (J. El-Sana). 1 These authors contribut...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید