Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization
نویسندگان
چکیده
Principal component analysis is a fundamental operation in computational data analysis, with myriad applications ranging from web search to bioinformatics to computer vision and image analysis. However, its performance and applicability in real scenarios are limited by a lack of robustness to outlying or corrupted observations. This paper considers the idealized “robust principal component analysis” problem of recovering a low rank matrix A from corrupted observations D = A + E. Here, the error entries E can be arbitrarily large (modeling grossly corrupted observations common in visual and bioinformatic data), but are assumed to be sparse. We prove that most matrices A can be efficiently and exactly recovered from most error sign-and-support patterns, by solving a simple convex program. Our result holds even when the rank of A grows nearly proportionally (up to a logarithmic factor) to the dimensionality of the observation space and the number of errors E grows in proportion to the total number of entries in the matrix. A by-product of our analysis is the first proportional growth results for the related but somewhat easier problem of completing a low-rank matrix from a small fraction of its entries. We propose an algorithm based on iterative thresholding that, for large matrices, is significantly faster and more scalable than general-purpose solvers. We give simulations and real-data examples corroborating the theoretical results.
منابع مشابه
Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization
Principal component analysis is a fundamental operation in computational data analysis, with myriad applications ranging from web search to bioinformatics to computer vision and image analysis. However, its performance and applicability in real scenarios are limited by a lack of robustness to outlying or corrupted observations. This paper considers the idealized “robust principal component anal...
متن کاملRobust Transfer Principal Component Analysis with Rank Constraints
Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the chall...
متن کاملFast Automatic Background Extraction via Robust PCA
Recent years have seen an explosion of interest in applications of sparse signal recovery and low rank matrix completion, due in part to the compelling use of the nuclear norm as a convex proxy for matrix rank. In some cases, minimizing the nuclear norm is equivalent to minimizing the rank of a matrix, and can lead to exact recovery of the underlying rank structure, see [Faz02, RFP10] for backg...
متن کاملExact Tensor Completion from Sparsely Corrupted Observations via Convex Optimization
This paper conducts a rigorous analysis for provable estimation of multidimensional arrays, in particular third-order tensors, from a random subset of its corrupted entries. Our study rests heavily on a recently proposed tensor algebraic framework in which we can obtain tensor singular value decomposition (t-SVD) that is similar to the SVD for matrices, and define a new notion of tensor rank re...
متن کاملFRPCA: Fast Robust Principal Component Analysis
While the performance of Robust Principal Component Analysis (RPCA), in terms of the recovered low-rank matrices, is quite satisfactory to many applications, the time efficiency is not, especially for scalable data. We propose to solve this problem using a novel fast incremental RPCA (FRPCA) approach. The low rank matrices of the incrementally-observed data are estimated using a convex optimiza...
متن کامل