Best subset selection, persistence in high-dimensional statistical learning and optimization under l₁ constraint
From MaRDI portal
Publication:869974
DOI10.1214/009053606000000768zbMATH Open1106.62022arXivmath/0702684OpenAlexW3104950855WikidataQ105584233 ScholiaQ105584233MaRDI QIDQ869974FDOQ869974
Authors: Eitan Greenshtein
Publication date: 12 March 2007
Published in: The Annals of Statistics (Search for Journal in Brave)
Abstract: Let be a random vector. It is desired to predict based on . Examples of prediction methods are regression, classification using logistic regression or separating hyperplanes, and so on. We consider the problem of best subset selection, and study it in the context , , where is the number of observations. We investigate procedures that are based on empirical risk minimization. It is shown, that in common cases, we should aim to find the best subset among those of size which is of order . It is also shown, that in some ``asymptotic sense, when assuming a certain sparsity condition, there is no loss in letting be much larger than , for example, . This is in comparison to starting with the ``best subset of size smaller than and regardless of the value of . We then study conditions under which empirical risk minimization subject to constraint yields nearly the best subset. These results extend some recent results obtained by Greenshtein and Ritov. Finally we present a high-dimensional simulation study of a ``boosting type classification procedure.
Full work available at URL: https://arxiv.org/abs/math/0702684
Recommendations
- Persistene in high-dimensional linear predictor-selection and the virtue of overparametrization
- The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\). (With discussions and rejoinder).
- Prediction when fitting simple models to high-dimensional data
- Better subset regression
- Boosting for high-dimensional linear models
Cites Work
- The elements of statistical learning. Data mining, inference, and prediction
- Least angle regression. (With discussion)
- Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
- Title not available (Why is that?)
- Robust regression: Asymptotics, conjectures and Monte Carlo
- High-dimensional graphs and variable selection with the Lasso
- Statistical modeling: The two cultures. (With comments and a rejoinder).
- Persistene in high-dimensional linear predictor-selection and the virtue of overparametrization
- Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations
- Nonconcave penalized likelihood with a diverging number of parameters.
- Title not available (Why is that?)
- Title not available (Why is that?)
- Discussion on boosting papers.
- Asymptotic behavior of M-estimators of p regression parameters when \(p^ 2/n\) is large. I. Consistency
- Functional aggregation for nonparametric regression.
- For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution
- Atomic decomposition by basis pursuit
- Efficient agnostic learning of neural networks with bounded fan-in
- Asymptotic behavior of M-estimators for the linear model
- On the Bayes-risk consistency of regularized boosting methods.
- Title not available (Why is that?)
- Population theory for boosting ensembles.
- DNA microarray experiments: biological and technological aspects
- Title not available (Why is that?)
- Discussion on boosting papers.
Cited In (45)
- Persistene in high-dimensional linear predictor-selection and the virtue of overparametrization
- On two continuum armed bandit problems in high dimensions
- Greedy algorithms for prediction
- Mathematical programming for simultaneous feature selection and outlier detection under l1 norm
- Title not available (Why is that?)
- Nonparametric time series forecasting with dynamic updating
- Approximation of functions of few variables in high dimensions
- High-dimensional generalized linear models and the lasso
- Sample average approximation with heavier tails II: localization in stochastic convex optimization and persistence results for the Lasso
- Gibbs posterior for variable selection in high-dimensional classification and data mining
- Sharp support recovery from noisy random measurements by \(\ell_1\)-minimization
- Graphical-model based high dimensional generalized linear models
- Forecasting functional time series
- Title not available (Why is that?)
- Genetic Algorithm in the Wavelet Domain for LargepSmallnRegression
- Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities
- Near-ideal model selection by \(\ell _{1}\) minimization
- Model selection in utility-maximizing binary prediction
- Best subset binary prediction
- Title not available (Why is that?)
- Risk minimization for time series binary choice with variable selection
- \(\ell _{1}\)-regularized linear regression: persistence and oracle inequalities
- Honest variable selection in linear and logistic regression models via \(\ell _{1}\) and \(\ell _{1}+\ell _{2}\) penalization
- On the asymptotic properties of the group lasso estimator for linear models
- On the sensitivity of the Lasso to the number of predictor variables
- OR forum: An algorithmic approach to linear regression
- Gene selection and prediction for cancer classification using support vector machines with a reject option
- The statistical rate for support matrix machines under low rankness and row (column) sparsity
- Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms
- Sure Independence Screening for Ultrahigh Dimensional Feature Space
- Best subset selection via a modern optimization lens
- Kullback-Leibler aggregation and misspecified generalized linear models
- Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models
- Learning without concentration
- The log-linear group-lasso estimator and its asymptotic properties
- Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications
- Properties and refinements of the fused Lasso
- Difference-of-Convex Learning: Directional Stationarity, Optimality, and Sparsity
- Constrained optimization for stratified treatment rules in reducing hospital readmission rates of diabetic patients
- Complexity of approximation of functions of few variables in high dimensions
- Better subset regression
- High-dimensional classification using features annealed independence rules
- Sparse regression at scale: branch-and-bound rooted in first-order optimization
- Regularization in statistics
- Elastic-net regularization in learning theory
Uses Software
This page was built for publication: Best subset selection, persistence in high-dimensional statistical learning and optimization under \(l_1\) constraint
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q869974)