Asymptotic performance of PCA for high-dimensional heteroscedastic data
From MaRDI portal
(Redirected from Publication:1661372)
Abstract: Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional and heteroscedastic. This paper analyzes the statistical performance of PCA in this setting, i.e., for high-dimensional data drawn from a low-dimensional subspace and degraded by heteroscedastic noise. We provide simplified expressions for the asymptotic PCA recovery of the underlying subspace, subspace amplitudes and subspace coefficients; the expressions enable both easy and efficient calculation and reasoning about the performance of PCA. We exploit the structure of these expressions to show that, for a fixed average noise variance, the asymptotic recovery of PCA for heteroscedastic data is always worse than that for homoscedastic data (i.e., for noise variances that are equal across samples). Hence, while average noise variance is often a practically convenient measure for the overall quality of data, it gives an overly optimistic estimate of the performance of PCA for heteroscedastic data.
Recommendations
- Heteroskedastic PCA: algorithm, optimality, and applications
- Nonasymptotic upper bounds for the reconstruction error of PCA
- PCA consistency in high dimension, low sample size context
- On consistency and sparsity for principal components analysis in high dimensions
- Principal component analysis in very high-dimensional spaces
Cites work
- Asymptotic conditional singular value decomposition for high-dimensional genomic data
- Asymptotic performance of PCA for high-dimensional heteroscedastic data
- Asymptotics of sample eigenstructure for a large dimensional spiked covariance model
- Covariance regularization by thresholding
- Finite sample approximation results for principal component analysis: A matrix perturbation approach
- High breakdown estimators for principal components: the projection-pursuit approach revis\-ited
- Large sample covariance matrices and high-dimensional data analysis
- Matrix estimation by universal singular value thresholding
- On consistency and sparsity for principal components analysis in high dimensions
- On sample eigenvalues in a generalized spiked population model
- On the distribution of the largest eigenvalue in principal components analysis
- Operator norm consistent estimation of large-dimensional sparse covariance matrices
- OptShrink: An Algorithm for Improved Low-Rank Signal Matrix Denoising by Optimal, Data-Driven Singular Value Shrinkage
- Principal component analysis.
- Probabilistic Principal Component Analysis
- Rank-Sparsity Incoherence for Matrix Decomposition
- Recursive Robust PCA or Recursive Sparse Recovery in Large but Structured Noise
- Robust Estimation of Dispersion Matrices and Principal Components
- Robust PCA via Outlier Pursuit
- Robust Statistics
- Robust computation of linear models by convex relaxation
- Robust principal component analysis?
- Spectral analysis of large dimensional random matrices
- Statistical challenges of high-dimensional data
- Statistical mechanics of unsupervised structure recognition
- Strong convergence of the empirical distribution of eigenvalues of sample covariance matrices with a perturbation matrix
- The polynomial method for random matrices
- The singular values and vectors of low rank perturbations of large rectangular random matrices
Cited in
(15)- Factor Extraction in Dynamic Factor Models: Kalman Filter Versus Principal Components
- Stochastic gradients for large-scale tensor decomposition
- Matrix denoising for weighted loss functions and heterogeneous signals
- Rapid evaluation of the spectral signal detection threshold and Stieltjes transform
- Boundary behavior in high dimension, low sample size asymptotics of PCA
- On the non-asymptotic concentration of heteroskedastic Wishart-type matrix
- \textit{ScreeNOT}: exact MSE-optimal singular value thresholding in correlated noise
- Biwhitening Reveals the Rank of a Count Matrix
- Optimally Weighted PCA for High-Dimensional Heteroscedastic Data
- Variance variation criterion and consistency in estimating the number of significant signals of high-dimensional PCA
- Asymptotic performance of PCA for high-dimensional heteroscedastic data
- Heteroskedastic PCA: algorithm, optimality, and applications
- A note on identifiability conditions in confirmatory factor analysis
- Inference for heteroskedastic PCA with missing data
- Asymptotic Distribution of Studentized Contribution Ratio in High-Dimensional Principal Component Analysis
This page was built for publication: Asymptotic performance of PCA for high-dimensional heteroscedastic data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1661372)