Permutation methods for factor analysis and PCA
From MaRDI portal
Abstract: Researchers often have datasets measuring features of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how many components affect the data? This is an important problem, because it has a large impact on all downstream data analysis. Consequently, many approaches have been developed to address it. Parallel Analysis is a popular permutation method. It works by randomly scrambling each feature of the data. It selects components if their singular values are larger than those of the permuted data. Despite widespread use in leading textbooks and scientific publications, as well as empirical evidence for its accuracy, it currently has no theoretical justification. In this paper, we show that the parallel analysis permutation method consistently selects the large components in certain high-dimensional factor models. However, it does not select the smaller components. The intuition is that permutations keep the noise invariant, while "destroying" the low-rank signal. This provides justification for permutation methods in PCA and factor models under some conditions. Our work uncovers drawbacks of permutation methods, and paves the way to improvements.
Recommendations
- A new approach for selecting the number of factors
- Deterministic parallel analysis: an improved method for selecting factors and principal components
- Statistical significance of the contribution of variables to the PCA solution: an alternative permutation strategy
- On the number of principal components: a test of dimensionality based on measurements of similarity between matrices
- Stability approach to selecting the number of principal components
Cites work
- scientific article; zbMATH DE number 3136275 (Why is no real title available?)
- scientific article; zbMATH DE number 3045859 (Why is no real title available?)
- A general framework for multiple testing dependence
- A rationale and test for the number of factors in factor analysis
- Asymptotic power of sphericity tests for high-dimensional data
- Asymptotics of sample eigenstructure for a large dimensional spiked covariance model
- Asymptotics of the principal components estimator of large factor models with weakly influential factors
- Considering Horn's parallel analysis from a random matrix theory point of view
- Determining the number of components from the matrix of partial correlations
- Deterministic parallel analysis: an improved method for selecting factors and principal components
- Eigenvalue significance testing for genetic association
- Estimation of spiked eigenvalues in spiked models
- Finite sample approximation results for principal component analysis: A matrix perturbation approach
- High-dimensional asymptotics of prediction: ridge regression and classification
- How many principal components? Stopping rules for determining the number of non-trivial axes revisited
- On the distribution of the largest eigenvalue in principal components analysis
- OptShrink: An Algorithm for Improved Low-Rank Signal Matrix Denoising by Optimal, Data-Driven Singular Value Shrinkage
- Optimal prediction in the linearly transformed spiked model
- Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices
- Principal component analysis.
- Random matrix theory in statistics: a review
- Simultaneous dimension reduction and adjustment for confounding variation
- Spectral analysis of large dimensional random matrices
- Testing hypotheses about the number of factors in large factor models
- The singular values and vectors of low rank perturbations of large rectangular random matrices
Cited in
(15)- Matrix denoising for weighted loss functions and heterogeneous signals
- Rapid evaluation of the spectral signal detection threshold and Stieltjes transform
- Deterministic parallel analysis: an improved method for selecting factors and principal components
- Permutation Statistical Methods with R
- Biwhitening Reveals the Rank of a Count Matrix
- A CLT for the LSS of large-dimensional sample covariance matrices with diverging spikes
- Statistical significance of the contribution of variables to the PCA solution: an alternative permutation strategy
- Statistical inference for principal components of spiked covariance matrices
- Sampling without replacement from a high-dimensional finite population
- The limiting spectral distribution of large random permutation matrices
- Estimating change-point latent factor models for high-dimensional time series
- A note on the likelihood ratio test in high-dimensional exploratory factor analysis
- Estimating Number of Factors by Adjusted Eigenvalues Thresholding
- Consistency of invariance-based randomization tests
- Considering Horn's parallel analysis from a random matrix theory point of view
This page was built for publication: Permutation methods for factor analysis and PCA
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2215761)