Robust PCA for High-Dimensional Data
نویسندگان
چکیده
We consider the dimensionality-reduction problem for a contaminated data set in a very high dimensional space, i.e., the problem of finding a subspace approximation of observed data, where the number of observations is of the same magnitude as the number of variables of each observation, and the data set contains some outlying observations. We propose a High-dimension Robust Principal Component Analysis (HRPCA) algorithm that is tractable, robust to outliers and easily kernelizable. The resulted subspace has a bounded deviation from the desired one, and achieves optimality in the limit case where the portion of outliers goes to zero.
منابع مشابه
Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملRobust PCA and classification in biosciences
MOTIVATION Principal components analysis (PCA) is a very popular dimension reduction technique that is widely used as a first step in the analysis of high-dimensional microarray data. However, the classical approach that is based on the mean and the sample covariance matrix of the data is very sensitive to outliers. Also, classification methods based on this covariance matrix do not give good r...
متن کاملThe Robust Clustering with Reduction Dimension
A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA)....
متن کاملPrincipal Component Analysis with Contaminated Data: The High Dimensional Case
We consider the dimensionality-reduction problem (finding a subspace approximation of observed data) for contaminated data in the high dimensional regime, where the the number of observations is of the same magnitude as the number of variables of each observation, and the data set contains some (arbitrarily) corrupted observations. We propose a High-dimensional Robust Principal Component Analys...
متن کاملPrincipal Component Analysis with Contaminated Data: The High Dimensional Case
We consider the dimensionality-reduction problem (finding a subspace approximation of observed data) for contaminated data in the high dimensional regime, where the number of observations is of the same magnitude as the number of variables of each observation, and the data set contains some (arbitrarily) corrupted observations. We propose a High-dimensional Robust Principal Component Analysis (...
متن کامل