Multi-view hac for Semi-supervised Document Image Classification
نویسندگان
چکیده
This paper presents a semi-supervised document image classification system that aims to be integrated into a commercial document reading software. This system is asserted like an annotation help. From a set of unknown document images given by a human operator, the system computes regrouping hypothesis of same physical layout images and proposes them to the operator. Then he can correct them, validate them, keeping in mind that his objective is to have homogeneous groups of images. These groups will be used for the training of the supervised document image classifier. Our system contains N feature spaces and a metric function for each of them. These allow to compute the similarity between two points of the same space. After projecting each image in these N feature spaces, the system builds N hierarchical agglomerative classification trees (hac) corresponding to each feature space. The proposals for regroupings formulated by the various hac are confronted and merged. Results, evaluated by the number of corrections done by the operator are presented on different image sets.
منابع مشابه
Semi-supervised multi-label image classification based on nearest neighbor editing
Semi-supervised multi-label classification has been applied to many real-world applications such as image classification, document classification and so on. In semi-supervised learning, unlabeled samples are added to the training set for enhancing the classification performance, however, noises are introduced simultaneously. In order to reduce this negative effect, the nearest neighbor data edi...
متن کاملSemi-Supervised Learning with Multi-View Embedding: Theory and Application with Convolutional Neural Networks
This paper presents a theoretical analysis of multi-view embedding – feature embedding that can be learned from unlabeled data through the task of predicting one view from another. We prove its usefulness in supervised learning under certain conditions. The result explains the effectiveness of some existing methods such as word embedding. Based on this theory, we propose a new semi-supervised l...
متن کاملActive + Semi-supervised Learning = Robust Multi-View Learning
In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept. Semi-supervised, multi-view algorithms, which reduce the amount of labeled data required for learning, rely on the assumptions that the views are compatible and uncorrelated (i.e., every example is identically labeled by the target concepts in eac...
متن کاملMulti-view Semi-supervised Learning: An Approach to Obtain Different Views from Text Datasets
The supervised machine learning approach usually requires a large number of labelled examples to learn accurately. However, labelling can be a costly and time consuming process, especially when manually performed. In contrast, unlabelled examples are usually inexpensive and easy to obtain. This is the case for text classification tasks involving on-line data sources, such as web pages, email an...
متن کاملA Novel Multi label Text Classification Model using Semi supervised learning
Automatic text categorization (ATC) is a prominent research area within Information retrieval. Through this paper a classification model for ATC in multi-label domain is discussed. We are proposing a new multi label text classification model for assigning more relevant set of categories to every input text document. Our model is greatly influenced by graph based framework and Semi supervised le...
متن کامل