Bi-cross-validation for factor analysis
From MaRDI portal
Abstract: Factor analysis is over a century old, but it is still problematic to choose the number of factors for a given data set. The scree test is popular but subjective. The best performing objective methods are recommended on the basis of simulations. We introduce a method based on bi-cross-validation, using randomly held-out submatrices of the data to choose the number of factors. We find it performs better than the leading methods of parallel analysis (PA) and Kaiser's rule. Our performance criterion is based on recovery of the underlying factor-loading (signal) matrix rather than identifying the true number of factors. Like previous comparisons, our work is simulation based. Recent advances in random matrix theory provide principled choices for the number of factors when the noise is homoscedastic, but not for the heteroscedastic case. The simulations we choose are designed using guidance from random matrix theory. In particular, we include factors too small to detect, factors large enough to detect but not large enough to improve the estimate, and two classes of factors large enough to be useful. Much of the advantage of bi-cross-validation comes from cases with factors large enough to detect but too small to be well estimated. We also find that a form of early stopping regularization improves the recovery of the signal matrix.
Recommendations
- Bi-cross-validation of the SVD and the nonnegative matrix factorization
- Determining the number of factors in approximate factor models by twice K-fold cross validation
- Factor Analysis Revisited – How Many Factors are There?
- Determining the number of factors when the number of factors can increase with sample size
- Model selection for factor analysis: some new criteria and performance comparisons
Cites work
- scientific article; zbMATH DE number 3092167 (Why is no real title available?)
- A Testing Procedure for Determining the Number of Factors in Approximate Factor Models With Large Datasets
- A general framework for multiple testing dependence
- A rationale and test for the number of factors in factor analysis
- A review of signal subspace speech enhancement and its application to noise robust speech recognition
- Asymptotic analysis of the squared estimation error in misspecified factor models
- Asymptotics of sample eigenstructure for a large dimensional spiked covariance model
- Asymptotics of the principal components estimator of large factor models with weakly influential factors
- Bi-cross-validation for factor analysis
- Bi-cross-validation of the SVD and the nonnegative matrix factorization
- Boosting as a regularized path to a maximum margin classifier
- Boosting with early stopping: convergence and consistency
- Detection of signals by information theoretic criteria: general asymptotic performance analysis
- Determining the Number of Factors in Approximate Factor Models
- Determining the Number of Factors in the General Dynamic Factor Model
- Determining the number of components from the matrix of partial correlations
- Eigenvalue ratio test for the number of factors
- Eigenvalues of large sample covariance matrices of spiked population models
- Equivalence of regularization and truncated iteration in the solution of ill-posed image reconstruction problems
- Factor modeling for high-dimensional time series: inference for the number of factors
- Finite sample approximation results for principal component analysis: A matrix perturbation approach
- How many principal components? Stopping rules for determining the number of non-trivial axes revisited
- Improved penalization for determining the number of factors in approximate factor models
- Latent variable graphical model selection via convex optimization
- Multiple hypothesis testing adjusted for latent variables, with an application to the AGEMAP gene expression data
- Networks, crowds and markets. Reasoning about a highly connected world.
- On a Heuristic Method of Test Construction and its use in Multivariate Analysis
- On early stopping in gradient descent learning
- OptShrink: An Algorithm for Improved Low-Rank Signal Matrix Denoising by Optimal, Data-Driven Singular Value Shrinkage
- Principal component analysis.
- Sample Eigenvalue Based Detection of High-Dimensional Signals in White Noise Using Relatively Few Samples
- Selecting the number of principal components: estimation of the true rank of a noisy matrix
- Statistical analysis of factor models of high dimension
- TESTS OF SIGNIFICANCE FOR THE LATENT ROOTS OF COVARIANCE AND CORRELATION MATRICES
- The Generalized Dynamic Factor Model
- The Optimal Hard Threshold for Singular Values is <inline-formula> <tex-math notation="TeX">\(4/\sqrt {3}\) </tex-math></inline-formula>
- The singular values and vectors of low rank perturbations of large rectangular random matrices
Cited in
(21)- Prediction in functional regression with discretely observed and noisy covariates
- Preprocessing noisy functional data: a multivariate perspective
- Deterministic parallel analysis: an improved method for selecting factors and principal components
- Exploratory bi-factor analysis: the oblique case
- Sparse latent factor regression models for genome-wide and epigenome-wide association studies
- Hypothesis tests for principal component analysis when variables are standardized
- A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model
- Safety signal detection with control of latent factors
- esaBcv
- Bayesian generalized linear low rank regression models for the detection of vaccine-adverse event associations
- scientific article; zbMATH DE number 7415123 (Why is no real title available?)
- Bi-cross-validation of the SVD and the nonnegative matrix factorization
- Unifying and generalizing methods for removing unwanted variation based on negative controls
- Estimating and Accounting for Unobserved Covariates in High-Dimensional Correlated Data
- scientific article; zbMATH DE number 7370637 (Why is no real title available?)
- Structured latent factor analysis for large-scale data: identifiability, estimability, and their implications
- Bi-cross-validation for factor analysis
- Heteroskedastic PCA: algorithm, optimality, and applications
- Consistently recovering the signal from noisy functional data
- A Matrix-Free Likelihood Method for Exploratory Factor Analysis of High-Dimensional Gaussian Data
- Identifying Effects of Multiple Treatments in the Presence of Unmeasured Confounding
This page was built for publication: Bi-cross-validation for factor analysis
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q104117)