Asymptotic performance of PCA for high-dimensional heteroscedastic data

From MaRDI portal
Publication:1661372

DOI10.1016/J.JMVA.2018.06.002zbMATH Open1395.62139DBLPjournals/ma/HongBF18arXiv1703.06610OpenAlexW2734385411WikidataQ91692254 ScholiaQ91692254MaRDI QIDQ1661372FDOQ1661372

David Hong, Laura Balzano, Jeffrey A. Fessler

Publication date: 16 August 2018

Published in: Journal of Multivariate Analysis (Search for Journal in Brave)

Abstract: Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional and heteroscedastic. This paper analyzes the statistical performance of PCA in this setting, i.e., for high-dimensional data drawn from a low-dimensional subspace and degraded by heteroscedastic noise. We provide simplified expressions for the asymptotic PCA recovery of the underlying subspace, subspace amplitudes and subspace coefficients; the expressions enable both easy and efficient calculation and reasoning about the performance of PCA. We exploit the structure of these expressions to show that, for a fixed average noise variance, the asymptotic recovery of PCA for heteroscedastic data is always worse than that for homoscedastic data (i.e., for noise variances that are equal across samples). Hence, while average noise variance is often a practically convenient measure for the overall quality of data, it gives an overly optimistic estimate of the performance of PCA for heteroscedastic data.


Full work available at URL: https://arxiv.org/abs/1703.06610




Recommendations




Cites Work


Cited In (14)

Uses Software





This page was built for publication: Asymptotic performance of PCA for high-dimensional heteroscedastic data

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1661372)