PCA from noisy, linearly reduced data: the diagonal case

نویسندگان

  • Edgar Dobriban
  • William Leeb
  • Amit Singer
چکیده

Suppose we observe data of the form Yi = Di(Si + εi) ∈ R or Yi = DiSi + εi ∈ R, i = 1, . . . , n, where Di ∈ Rp×p are known diagonal matrices, εi are noise, and we wish to perform principal component analysis (PCA) on the unobserved signals Si ∈ R. The first model arises in missing data problems, where the Di are binary. The second model captures noisy deconvolution problems, where the Di are the Fourier transforms of the convolution kernels. It is often reasonable to assume the Si lie on an unknown low-dimensional linear space; however, because many coordinates can be suppressed by the Di, this low-dimensional structure can be obscured. We introduce diagonally reduced spiked covariance models to capture this setting. We characterize the behavior of the singular vectors and singular values of the data matrix under high-dimensional asymptotics where n, p → ∞ such that p/n → γ > 0. Our results have the most general assumptions to date even without diagonal reduction. Using them, we develop optimal eigenvalue shrinkage methods for covariance matrix estimation and optimal singular value shrinkage methods for data denoising. Finally, we characterize the error rates of the empirical Best Linear Predictor (EBLP) denoisers. We show that, perhaps surprisingly, their optimal tuning depends on whether we denoise in-sample or out-of-sample, but the optimally tuned mean squared error is the same in the two cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault Strike Detection Using Satellite Gravity Data Decomposition by Discrete Wavelets: A Case Study from Iran

Estimating the gravity anomaly causative bodies boundary can facilitate the gravity field interpretation. In this paper, 2D discrete wavelet transform (DWT) is employed as a method to delineate the boundary of the gravity anomaly sources. Hence, the GRACE’ satellite gravity data is decomposed using DWT. DWT decomposites a single approximation coefficients into four distinct components: the appr...

متن کامل

On computing the maximum corank in the Frisch scheme

The paper concerns the Frisch scheme for identii-cation of linear relations from static noisy data. The central problem is, for a given non-singular nn noisy, linearly independent data covariance matrix n , determine the maximal corank of n ? n , and associated maximizers ? n , over the set of diagonal uncorre-lated additive noise covariance matrices n such that n n , i.e. such that n ? ? n is ...

متن کامل

ePCA: High Dimensional Exponential Family PCA

Many applications involve large collections of high-dimensional datapoints with noisy entries from exponential family distributions. It is of interest to estimate the covariance and principal components of the noiseless distribution. In photon-limited imaging (e.g. XFEL) we want to estimate the covariance of the pixel intensities of 2-D images, where the pixels are low-intensity Poisson variabl...

متن کامل

Gmm Based on Local Robust Pca for Speaker Identification

ABSTRACT: To solve the problems of outliers and high dimensionality of training feature vectors in speaker identification, in this paper, we propose an efficient GMM based on local robust PCA with VQ. The proposed method firstly partitions the data space into several disjoint regions by VQ, and then performs robust PCA using the iteratively reweighted covariance matrix in each region. Finally, ...

متن کامل

System Identification Based on Frequency Response Noisy Data

In this paper, a new algorithm for system identification based on frequency response is presented. In this method, given a set of magnitudes and phases of the system transfer function in a set of discrete frequencies, a system of linear equations is derived which has a unique and exact solution for the coefficients of the transfer function provided that the data is noise-free and the degrees of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016