On principal components regression, random projections, and column subsampling
From MaRDI portal
Publication:1616329
DOI10.1214/18-EJS1486zbMATH Open1414.62219arXiv1709.08104OpenAlexW2963022876MaRDI QIDQ1616329FDOQ1616329
Authors: Martin Slawski
Publication date: 1 November 2018
Published in: Electronic Journal of Statistics (Search for Journal in Brave)
Abstract: Principal Components Regression (PCR) is a traditional tool for dimension reduction in linear regression that has been both criticized and defended. One concern about PCR is that obtaining the leading principal components tends to be computationally demanding for large data sets. While random projections do not possess the optimality properties of the leading principal subspace, they are computationally appealing and hence have become increasingly popular in recent years. In this paper, we present an analysis showing that for random projections satisfying a Johnson-Lindenstrauss embedding property, the prediction error in subsequent regression is close to that of PCR, at the expense of requiring a slightly large number of random projections than principal components. Column sub-sampling constitutes an even cheaper way of randomized dimension reduction outside the class of Johnson-Lindenstrauss transforms. We provide numerical results based on synthetic and real data as well as basic theory revealing differences and commonalities in terms of statistical performance.
Full work available at URL: https://arxiv.org/abs/1709.08104
Recommendations
- Principal component regression revisited
- Projection-pursuit based principal component analysis: a large sample theory
- Sparse Principal Component Analysis via Axis-Aligned Random Projections
- scientific article; zbMATH DE number 775112
- Random Projections for Large-Scale Regression
- Principal Components Regression by Using Generalized Principal Components Analysis
- On principal subspace analysis
- The principal problem with principal components regression
- Regularized principal component analysis
Factor analysis and principal components; correspondence analysis (62H25) Linear regression; mixed models (62J05)
Cites Work
- Statistics for high-dimensional data. Methods, theory and applications.
- Bagging predictors
- Title not available (Why is that?)
- Extensions of Lipschitz mappings into a Hilbert space
- Title not available (Why is that?)
- Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions
- Random-projection ensemble classification. (With discussion).
- Optimal selection of reduced rank estimators of high-dimensional matrices
- Title not available (Why is that?)
- A tail inequality for quadratic forms of subgaussian random vectors
- A simple proof of the restricted isometry property for random matrices
- Adaptive estimation of a quadratic functional by model selection.
- Random Projections for Large-Scale Regression
- Improved analysis of the subsampled randomized Hadamard transform
- Normal Multivariate Analysis and the Orthogonal Group
- Database-friendly random projections: Johnson-Lindenstrauss with binary coins.
- New and Improved Johnson–Lindenstrauss Embeddings via the Restricted Isometry Property
- On variants of the Johnson–Lindenstrauss lemma
- On regularization algorithms in learning theory
- On principal components and regression: a statistical explanation of a natural phenomenon
- Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform
- Compressed and Privacy-Sensitive Sparse Regression
- Nearest-neighbor-preserving embeddings
- Randomized Sketches of Convex Programs With Sharp Guarantees
- A Random Matrix-Theoretic Approach to Handling Singular Covariance Estimates
- Kernel ridge vs. principal component regression: minimax bounds and the qualification of regularization operators
- An almost optimal unrestricted fast Johnson-Lindenstrauss transform
- Optimization methods for large-scale machine learning
- Sketching as a tool for numerical linear algebra
- A risk comparison of ordinary least squares vs ridge regression
- A statistical perspective on randomized sketching for ordinary least-squares
- On \(b\)-bit min-wise hashing for large-scale regression and classification with sparse data
- Sketched ridge regression: optimization perspective, statistical perspective, and model averaging
- Title not available (Why is that?)
Cited In (9)
- Principal component projection with low-degree polynomials
- Title not available (Why is that?)
- Reduced rank regression with matrix projections for high-dimensional multivariate linear regression model
- Sketching for principal component regression
- Partial projective resampling method for dimension reduction: with applications to partially linear models
- Projective resampling estimation of informative predictor subspace for multivariate regression
- Dimensionality Reduction, Regularization, and Generalization in Overparameterized Regressions
- Thin-shell theory for rotationally invariant random simplices
- High-dimensional clustering via random projections
This page was built for publication: On principal components regression, random projections, and column subsampling
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1616329)