A survey of historical document image datasets
نویسندگان
چکیده
Abstract This paper presents a systematic literature review of image datasets for document analysis, focusing on historical documents, such as handwritten manuscripts and early prints. Finding appropriate analysis is crucial prerequisite to facilitate research using different machine learning algorithms. However, because the very large variety actual data (e.g., scripts, tasks, dates, support systems, amount deterioration), formats label representation, evaluation processes benchmarks, finding difficult task. work fills this gap, presenting meta-study existing datasets. After selection process (according PRISMA guidelines), we select 65 studies that are chosen based factors, year publication, number methods implemented in article, reliability algorithms, dataset size, journal outlet. We summarize each study by assigning it one three pre-defined tasks: classification, layout structure, or content analysis. present statistics, type, language, input visual aspects, ground truth information every dataset. In addition, provide benchmark tasks results from these papers recent competitions. further discuss gaps challenges domain. advocate providing conversion tools common COCO format computer vision tasks) always set metrics, instead just one, make comparable across studies.
منابع مشابه
Multispectral Image Restoration of Historical Document Images
Culture is preserved through various documents which is a part of the civilization and heritage. Due to extinction and single document copies available for the future generations about the ancient scripts, the archiving of these documents in the digital process is the solution for these problems. In this paper, the aim is to restore the historical document from tears, stains and poor visibility...
متن کاملRestoration of Degraded Historical Document Image
Restoration plays a very important role in enhancing the degraded noisy images. To enhance the degraded image, the numerous algorithms have been designed. Since image processing algorithms are subjective, not all algorithms that developed will address all type of degradedness. To address specific type of problem the suitable algorithms need to be selected. In this paper a combination of spatial...
متن کاملA Performance Evaluation Methodology for Historical Document Image Binarization
Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behaviour and verifying its effectiveness by providing qualitative and quantitative indication of its performance. This work concerns a pixel-based binarizat...
متن کاملA Unified Framework for Degraded Thai Historical Document Image Restoration
Binarization method is the key process to restore degraded historical document image. In this paper, the framework for degraded Thai historical document image restoration is proposed. The proposed framework consists of three stage including image filtering stage, local-based thresholding stage, and cluster analysis stage. Image filtering stage aims to eliminate some noises by using Wiener filte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Document Analysis and Recognition
سال: 2022
ISSN: ['1433-2833', '1433-2825']
DOI: https://doi.org/10.1007/s10032-022-00405-8