نتایج جستجو برای: historical documents
تعداد نتایج: 175141 فیلتر نتایج به سال:
Lists are often the most data-rich parts of a document collection, but are usually not set apart explicitly from the rest of the text, especially in a corpus of historical OCRed documents. There are many kinds of lists, differing from each other in both layout and content. Writing individualized code to process all possible types of lists is an expensive challenge. In the present research, we f...
The solution for a feature selection problem is presented in the field of document image processing. The choice of shape features for describing glyphs of historical documents is a non-trivial task since the variations of glyphs in different documents is innumerable. Hence, the manual selection of shape features would be a cumbersome task. To select a subset of features from a given set a genet...
In the research area of historical documents it is of high interest to reconstruct the process of the emergence of a historical typesetted document. Therefore, the chronological order of the different versions of a typesetted document has to be reconstructed. This is done by manually finding differences in two versions and then deciding on the order between these two versions. In this paper we ...
Sydenham's chorea (SC) became a well defined nosological entity only during the second half of the nineteenth century. Such progress was promoted by the availability of large clinical series provided by newly founded paediatric hospitals. This paper analyses the demographic and clinical features of patients with chorea admitted to the first British paediatric hospital (the Hospital for Sick Chi...
Mass digitization of historical documents is a challenging problem for optical character recognition (OCR) tools. Issues include noisy backgrounds and faded text due to aging, border/marginal noise, bleed-through, skewing, warping, as well as irregular fonts and page layouts. As a result, OCR tools often produce a large number of spurious bounding boxes (BBs) in addition to those that correspon...
Machine Translation (MT) plays a critical role in expanding capacity in the translation industry. However, many valuable documents, including digital documents, are encoded in non-accessible formats for machine processing (e.g., Historical or Legal documents). Such documents must be passed through a process of Optical Character Recognition (OCR) to render the text suitable for MT. No matter how...
In the absence of accurate handwriting recognition for historical documents, computer assisted transcription (CAT) methods move into the spotlight. We explore some of the weaknesses of current CAT systems and propose a CAT system which relies on subword spotting that overcomes most of these. The system is ideal crowdsourcing transcription to mobile users.
Handwritten Text Recognition is a problem that has gained attention in the last years due to the interest in the transcription of historical documents. Handwritten Text Recognition employs models that are similar to those employed in Automatic Speech Recognition (Hidden Markov Models and n-grams). Dictation of the contents of the document is an alternative to text recognition. In this work, we ...
This paper proposes a novel method for document enhancement. The method is based on the combination of two state-of-the-art filters through the construction of a mask. The mask is applied to a TV (Total Variation) regularized image where background noise has been reduced. The masked image is then filtered by NLmeans (Non Local Means) which reduces the noise in the text areas located by the mask...
We consider the problem of document binarization as a pre-processing step for optical character recognition (OCR) for the purpose of keyword search of historical printed documents. A number of promising techniques from the literature for binarization, pre-filtering, and post-binarization denoising were implemented along with newly developed methods for binarization: an error diffusion binarizat...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید