PCA from noisy, linearly reduced data: the diagonal case
نویسندگان
چکیده
Suppose we observe data of the form Yi = Di(Si + εi) ∈ R or Yi = DiSi + εi ∈ R, i = 1, . . . , n, where Di ∈ Rp×p are known diagonal matrices, εi are noise, and we wish to perform principal component analysis (PCA) on the unobserved signals Si ∈ R. The first model arises in missing data problems, where the Di are binary. The second model captures noisy deconvolution problems, where the Di are the Fourier transforms of the convolution kernels. It is often reasonable to assume the Si lie on an unknown low-dimensional linear space; however, because many coordinates can be suppressed by the Di, this low-dimensional structure can be obscured. We introduce diagonally reduced spiked covariance models to capture this setting. We characterize the behavior of the singular vectors and singular values of the data matrix under high-dimensional asymptotics where n, p → ∞ such that p/n → γ > 0. Our results have the most general assumptions to date even without diagonal reduction. Using them, we develop optimal eigenvalue shrinkage methods for covariance matrix estimation and optimal singular value shrinkage methods for data denoising. Finally, we characterize the error rates of the empirical Best Linear Predictor (EBLP) denoisers. We show that, perhaps surprisingly, their optimal tuning depends on whether we denoise in-sample or out-of-sample, but the optimally tuned mean squared error is the same in the two cases.
منابع مشابه
Fault Strike Detection Using Satellite Gravity Data Decomposition by Discrete Wavelets: A Case Study from Iran
Estimating the gravity anomaly causative bodies boundary can facilitate the gravity field interpretation. In this paper, 2D discrete wavelet transform (DWT) is employed as a method to delineate the boundary of the gravity anomaly sources. Hence, the GRACE’ satellite gravity data is decomposed using DWT. DWT decomposites a single approximation coefficients into four distinct components: the appr...
متن کاملOn computing the maximum corank in the Frisch scheme
The paper concerns the Frisch scheme for identii-cation of linear relations from static noisy data. The central problem is, for a given non-singular nn noisy, linearly independent data covariance matrix n , determine the maximal corank of n ? n , and associated maximizers ? n , over the set of diagonal uncorre-lated additive noise covariance matrices n such that n n , i.e. such that n ? ? n is ...
متن کاملePCA: High Dimensional Exponential Family PCA
Many applications involve large collections of high-dimensional datapoints with noisy entries from exponential family distributions. It is of interest to estimate the covariance and principal components of the noiseless distribution. In photon-limited imaging (e.g. XFEL) we want to estimate the covariance of the pixel intensities of 2-D images, where the pixels are low-intensity Poisson variabl...
متن کاملGmm Based on Local Robust Pca for Speaker Identification
ABSTRACT: To solve the problems of outliers and high dimensionality of training feature vectors in speaker identification, in this paper, we propose an efficient GMM based on local robust PCA with VQ. The proposed method firstly partitions the data space into several disjoint regions by VQ, and then performs robust PCA using the iteratively reweighted covariance matrix in each region. Finally, ...
متن کاملSystem Identification Based on Frequency Response Noisy Data
In this paper, a new algorithm for system identification based on frequency response is presented. In this method, given a set of magnitudes and phases of the system transfer function in a set of discrete frequencies, a system of linear equations is derived which has a unique and exact solution for the coefficients of the transfer function provided that the data is noise-free and the degrees of...
متن کامل