Algebraic-Geometric and Probabilistic Approaches for Clustering and Dimension Reduction of Mixtures of Principle Component Subspaces ECE842 Course Project Report
نویسنده
چکیده
Generalized Principal Component Analysis (GPCA) and Probabilistic Principal Component Analysis (PPCA) are two extensions of PCA approaches to the mixtures of principal subspaces. GPCA is an algebraic geometric framework in which the collection of linear subspaces is represented by a set of homogeneous polynomials whose degree corresponds to the number of subspaces and whose factors (roots) encode the subspace parameter. PPCA is a probabilistic approach where the principal component analysis is viewed as a maximum-likelihood procedure based on a probability density model of the observed data. Both techniques are capable of estimating a mixture of subspaces from sample data points, thus useful for data clustering and dimension reduction problems in multivariate data mining. The primary goal of this project is to carry out a conceptual study, to explore the principles and features of the algebraic-geometrical and probabilistic approaches to mixtures of principal component subspaces, and learn from hand-on experience though computational implementation of these techniques. A polynomial factorization algorithm (PFA) for GPCA and an expectation-maximization (EM) for PPCA were implemented using MATLAB codes. The implemented algorithms have been tested on synthetic data sets. It was shown that the PFA algorithm for GPCA can successfully identify the number of subspaces in the mixture, and estimate the normal vectors of the subspaces, if successful, with a relative high correlation. However, the implemented algorithm is not robust as it is data dependent. The potential problems of this implementation were discussed in the report. The implemented EM algorithm for PPCA showed that a probabilistic mixture model can identify the clusters, and assign the cluster association of each data point correctly. Both techniques estimated the component subspaces of lower dimensionality, thus data dimensions can be reduced and underlying clusters can be recovered. In this project, the implemented algorithms were only tested on synthetic 3-dimensional data and not yet tested on higher dimensional data or real data, and the algorithms are far from comprehensive for practical use. However, the computational implementation helped a lot in the understanding of the two approaches for mixtures of principal component subspaces.
منابع مشابه
Feature Dimension Reduction of Multisensor Data Fusion using Principal Component Fuzzy Analysis
These days, the most important areas of research in many different applications, with different tools, are focused on how to get awareness. One of the serious applications is the awareness of the behavior and activities of patients. The importance is due to the need of ubiquitous medical care for individuals. That the doctor knows the patient's physical condition, sometimes is very important. O...
متن کاملAnomaly detection through Probabilistic Support Vector Machine Classification
Probabilistic Support Vector Machine Classification (PSVC) is a real time detection and prediction algorithm that is used to overcome assumptions regarding the distribution of the data. Its classification output is complemented with a probabilistic cost function and incorporates a degree of uncertainty in its predictions. Computationally, this algorithm is fast because it performs the classific...
متن کاملApplication of Geometric Dependency Analysis to the Separation of Convolved Mixtures
We investigate a generalisation of the structure of frequency domain ICA as applied to the separation of convolved mixtures, and show how a geometric representation of residual dependency can be used both as an aid to visualisation and intuition, and as tool for clustering components into independent subspaces, thus providing a solution to the source separation problem.
متن کاملSubspace Clustering with Applications to Dynamical Vision ( CS 229 Final Project )
Data that arises from engineering applications often contains some type of low dimensional structure that enables intelligent representation and processing. This leads to a very challenging problem: discovering compact representations of high-dimensional data. A very common approach to address this problem is modeling data as a mixture of multiple linear (or affine) subspaces. Given a set of da...
متن کاملExtension of Cube Attack with Probabilistic Equations and its Application on Cryptanalysis of KATAN Cipher
Cube Attack is a successful case of Algebraic Attack. Cube Attack consists of two phases, linear equation extraction and solving the extracted equation system. Due to the high complexity of equation extraction phase in finding linear equations, we can extract nonlinear ones that could be approximated to linear equations with high probability. The probabilistic equations could be considered as l...
متن کامل