Generative Manifold Learning for the Exploration of Partially Labeled Data
نویسندگان
چکیده
In many real-world application problems, the availability of data labels for supervised learning is rather limited and incompletely labeled datasets are commonplace in some of the currently most active areas of research. A manifold learning model, namely Generative Topographic Mapping (GTM), is the basis of the methods developed in the thesis reported in this paper. A variant of GTM that uses a graph approximation to the geodesic metric is first defined. This model is capable of representing data of convoluted geometries. The standard GTM is here modified to prioritize neighbourhood relationships along the generated manifold. This is accomplished by penalizing the possible divergences between the Euclidean distances from the data points to the model prototypes and the corresponding geodesic distances along the manifold. The resulting Geodesic GTM (Geo-GTM) model is shown to improve the continuity and trustworthiness of the representation generated by the model, as well as to behave robustly in the presence of noise. We then proceed to define a novel semi-supervised model, SS-Geo-GTM, that extends Geo-GTM to deal with semi-supervised problems. In SS-Geo-GTM, the model prototypes obtained from Geo-GTM are linked by the nearest neighbour to the data manifold. The resulting proximity graph is used as the basis for a class label propagation algorithm. The performance of SS-Geo-GTM is experimentally assessed via accuracy and Matthews correlation coefficient, comparing positively with an Euclidean distance-based counterpart and the alternative Laplacian Eigenmaps and semi-supervised Gaussian mixture models.
منابع مشابه
A Geometry Preserving Kernel over Riemannian Manifolds
Abstract- Kernel trick and projection to tangent spaces are two choices for linearizing the data points lying on Riemannian manifolds. These approaches are used to provide the prerequisites for applying standard machine learning methods on Riemannian manifolds. Classical kernels implicitly project data to high dimensional feature space without considering the intrinsic geometry of data points. ...
متن کاملManifold Learning and Applications in Recognition
A large number of data such as images and characters under varying intrinsic principal features are thought of as constituting highly nonlinear manifolds in the high-dimensional observation space. Visualization and exploration of high-dimensional vector data are therefore the focus of much current machine learning research. However, most recognition systems using linear method are bound to igno...
متن کاملLearning Hybrid Models for Image Annotation with Partially Labeled Data
Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image...
متن کاملManifold Embeddings for Model-Based Reinforcement Learning of Neurostimulation Policies
Real-world reinforcement learning problems often exhibit nonlinear, continuous-valued, noisy, partially-observable state-spaces that are prohibitively expensive to explore. The formal reinforcement learning framework, unfortunately, has not been successfully demonstrated in a real-world domain having all of these constraints. We approach this domain with a two-part solution. First, we overcome ...
متن کاملSEVEN: Deep Semi-supervised Verification Networks
Verification determines whether two samples belong to the same class or not, and has important applications such as face and fingerprint verification, where thousands or millions of categories are present but each category has scarce labeled examples, presenting two major challenges for existing deep learning models. We propose a deep semisupervised model named SEmi-supervised VErification Netw...
متن کامل