Manifold Learning Using Euclidean -nearest Neighbor Graphs
نویسندگان
چکیده
In the manifold learning problem one seeks to discover a smooth low dimensional surface, i.e., a manifold embedded in a higher dimensional linear vector space, based on a set of measured sample points on the surface. In this paper we consider the closely related problem of estimating the manifold’s intrinsic dimension and the intrinsic entropy of the sample points. Specifically, we view the sample points as realizations of an unknown multivariate density supported on an unknown smooth manifold. In previous work we introduced a geometric probability method called Geodesic Minimal Spanning Tree (GMST) to obtain asymptotically consistent estimates of manifold dimension and entropy. In this paper we present a simpler method based on the -nearest neighbor ( -NN) graph that does not require estimation of geodesic distances on the manifold. The algorithm is applied to standard synthetic manifolds as well as real data sets consisting of images of faces.
منابع مشابه
Random Graphs for Structure Discovery in High-dimensional Data
Originally motivated by computational considerations, we demonstrate how computational efficient and scalable graph constructions can be used to encode both statistical and spatial information and address the problems of dimension reduction and structure discovery in high-dimensional data, with provable results. We discuss the asymptotic behavior of power weighted functionals of minimal Euclide...
متن کاملGeometry-Aware Neighborhood Search for Learning Local Models for Image Reconstruction
Local learning of sparse image models has proven to be very effective to solve inverse problems in many computer vision applications. To learn such models, the data samples are often clustered using the K-means algorithm with the Euclidean distance as a dissimilarity metric. However, the Euclidean distance may not always be a good dissimilarity measure for comparing data samples lying on a mani...
متن کاملProximity Graphs for Clustering and Manifold Learning
Many machine learning algorithms for clustering or dimensionality reduction take as input a cloud of points in Euclidean space, and construct a graph with the input data points as vertices. This graph is then partitioned (clustering) or used to redefine metric information (dimensionality reduction). There has been much recent work on new methods for graph-based clustering and dimensionality red...
متن کاملUsing the Mutual k-Nearest Neighbor Graphs for Semi-supervised Classification on Natural Language Data
The first step in graph-based semi-supervised classification is to construct a graph from input data. While the k-nearest neighbor graphs have been the de facto standard method of graph construction, this paper advocates using the less well-known mutual k-nearest neighbor graphs for high-dimensional natural language data. To compare the performance of these two graph construction methods, we ru...
متن کاملCommunity detection with manifold learning on speaker i-vector space for Chinese
Speaker recognition with clustering speech signals of the same speaker is an important speech analysis task in various applications. Recent works have shown that there was an underlying manifold on which speaker utterances live in the model-parameter space. However, most speaker clustering methods work on the Euclidean space, and hence often fail to discover the intrinsic geometrical structure ...
متن کامل