Globally Sparse Probabilistic PCA
نویسندگان
چکیده
With the flourishing development of highdimensional data, sparse versions of principal component analysis (PCA) have imposed themselves as simple, yet powerful ways of selecting relevant features in an unsupervised manner. However, when several sparse principal components are computed, the interpretation of the selected variables may be difficult since each axis has its own sparsity pattern and has to be interpreted separately. To overcome this drawback, we propose a Bayesian procedure that allows to obtain several sparse components with the same sparsity pattern. To this end, using Roweis’ probabilistic interpretation of PCA and an isotropic Gaussian prior on the loading matrix, we provide the first exact computation of the marginal likelihood of a Bayesian PCA model. In order to avoid the drawbacks of discrete model selection, we propose a simple relaxation of our framework which allows to find a path of models using a variational expectation-maximization algorithm. The exact marginal likelihood can eventually be maximized over this path, relying on Occam’s razor to select the relevant variables. Since the sparsity pattern is common to all components, we call this approach globally sparse probabilistic PCA (GSPPCA). Its usefulness is illustrated on synthetic data sets and on several real unsupervised feature selection problems. Appearing in Proceedings of the 19 International Conference on Artificial Intelligence and Statistics (AISTATS) 2016, Cadiz, Spain. JMLR: W&CP volume 51. Copyright 2016 by the authors.
منابع مشابه
Sparse Probabilistic Principal Component Analysis
Principal component analysis (PCA) is a popular dimensionality reduction algorithm. However, it is not easy to interpret which of the original features are important based on the principal components. Recent methods improve interpretability by sparsifying PCA through adding an L1 regularizer. In this paper, we introduce a probabilistic formulation for sparse PCA. By presenting sparse PCA as a p...
متن کاملExtensions of probabilistic PCA
Principal component analysis (PCA) is a classical data analysis technique. Some algorithms for PCA scale better than others to problems with high dimensionality. They also differ in the ability to handle missing values in the data. In our recent paper [1], a case is studied where the data are high-dimensional and a majority of the values are missing. In the case of very sparse data, overfitting...
متن کاملFundamental limits of symmetric low-rank matrix estimation
We consider the high-dimensional inference problem where the signal is a low-rank symmetric matrix which is corrupted by an additive Gaussian noise. Given a probabilistic model for the low-rank matrix, we compute the limit in the large dimension setting for the mutual information between the signal and the observations, as well as the matrix minimum mean square error, while the rank of the sign...
متن کاملDense Message Passing for Sparse Principal Component Analysis
We describe a novel inference algorithm for sparse Bayesian PCA with a zero-norm prior on the model parameters. Bayesian inference is very challenging in probabilistic models of this type. MCMC procedures are too slow to be practical in a very high-dimensional setting and standard mean-field variational Bayes algorithms are ineffective. We adopt a dense message passing algorithm similar to algo...
متن کاملModelling Handwitten Digit Data using Probabilistic Principal Component Analysis
Principal Component Analysis (PCA) is one century old now. Nevertheless, it still undergoes research and new extensions are found. Probabilistic Principal Component Analysis (PPCA, proposed by Tipping and Bishop) is one of these recent PCA extensions. PPCA defines a probabilistic generative model for PCA.It can easily be extended to mixture models. Among recent mixture density theoretical devel...
متن کامل