منابع مشابه
Comparison of named entity recognition tools for raw OCR text
This short paper analyses an experiment comparing the efficacy of several Named Entity Recognition (NER) tools at extracting entities directly from the output of an optical character recognition (OCR) workflow. The authors present how they first created a set of test data, consisting of raw and corrected OCR output manually annotated with people, locations, and organizations. They then ran each...
متن کاملMorphological filters for OCR: a performance comparison
In this article is compared the ability of several morphological operators to improve OCR performance when used as preprocessing filters. An experiment on binary and greyscale images using the Tesseract OCR engine and morphological filters acting in complex, graph and vertex spaces has thus been performed and results in a good overall performance of complex and area filters. MSE measures have a...
متن کاملA Comparison of Some Morphological Filters for Improving OCR Performance
Studying discrete space representations has recently lead to the development of novel morphological operators. To date, there has been no study evaluating the performances of those novel operators with respect to a specific application. This article compares the capability of several morphological operators, both old and new, to improve OCR performance when used as preprocessing filters. We des...
متن کاملHigh-performance OCR preclassification trees
We prese nt an automatic method for construc ting high-per forma nce pre classifica tion decision trees for OCR. Good pre classifier s must prune the set of alter native classes to a small number without err oneously pruning the corr ect class. We build the decision tree using gree dy entropy minimization, using pseudo-r andomly gener ated training samples der ived from a model of imaging defe ...
متن کاملPerformance Evaluation of Two Arabic OCR Products
Numerous Optical Character Recognition (OCR) companies claim that their products have near-perfect recognition accuracy (close to 99.9%). In practice, however, these accuracy rates are rarely achieved. Most systems break down when the input document images are highly degraded, such as scanned images of carbon-copy documents, documents printed on low-quality paper, and documents that are n-th ge...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of UbiComp
سال: 2015
ISSN: 0976-2213,0975-8992
DOI: 10.5121/iju.2015.6303