Sparsity oriented importance learning for high-dimensional linear regression
From MaRDI portal
Abstract: With now well-recognized non-negligible model selection uncertainty, data analysts should no longer be satisfied with the output of a single final model from a model selection process, regardless of its sophistication. To improve reliability and reproducibility in model choice, one constructive approach is to make good use of a sound variable importance measure. Although interesting importance measures are available and increasingly used in data analysis, little theoretical justification has been done. In this paper, we propose a new variable importance measure, sparsity oriented importance learning (SOIL), for high-dimensional regression from a sparse linear modeling perspective by taking into account the variable selection uncertainty via the use of a sensible model weighting. The SOIL method is theoretically shown to have the inclusion/exclusion property: When the model weights are properly around the true model, the SOIL importance can well separate the variables in the true model from the rest. In particular, even if the signal is weak, SOIL rarely gives variables not in the true model significantly higher important values than those in the true model. Extensive simulations in several illustrative settings and real data examples with guided simulations show desirable properties of the SOIL importance in contrast to other importance measures.
Recommendations
- Nonparametric variable importance assessment using machine learning techniques
- Model-Free Variable Selection
- Hierarchical testing of variable importance
- A stepwise regression method and consistent model selection for highdimensional sparse linear models
- FIRST: combining forward iterative selection and shrinkage in high dimensional sparse linear regression
Cites work
- scientific article; zbMATH DE number 3713012 (Why is no real title available?)
- scientific article; zbMATH DE number 2015216 (Why is no real title available?)
- scientific article; zbMATH DE number 1556150 (Why is no real title available?)
- scientific article; zbMATH DE number 3444596 (Why is no real title available?)
- scientific article; zbMATH DE number 845714 (Why is no real title available?)
- A new variable importance measure for random forests with missing data
- Adaptive Lasso for sparse high-dimensional regression models
- Adaptive Regression by Mixing
- Adaptively combined forecasting for discrete response time series
- An asymptotic property of model selection criteria
- Bayesian model averaging: A tutorial. (with comments and a rejoinder).
- Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation
- Combining Multiple Biomarker Models in Logistic Regression
- Confidence sets for model selection by F -testing
- Consistency of cross validation for comparing regression procedures
- Empirical characterization of random forest variable importance measures
- Estimating the dimension of a model
- Forecasting with factor-augmented regression: a frequentist model averaging approach
- Frequentist Model Average Estimators
- Generalized fiducial inference for ultrahigh-dimensional regression
- Information Theory and Mixing Least-Squares Regressions
- Interactive Tree-Structured Regression via Principal Hessian Directions
- Least Squares Model Averaging
- Model Selection for Estimating Treatment Effects
- Model Selection: An Integral Part of Inference
- Model combining in factorial data analysis
- Nearly unbiased variable selection under minimax concave penalty
- Optimal weight choice for frequentist model average estimators
- Random forests
- Stability selection. With discussion and authors' reply
- The Adaptive Lasso and Its Oracle Properties
- Toward optimal model averaging in regression models with time series errors
- Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
- Variable importance in binary regression trees and forests
Cited in
(18)- On Optimality of Mallows Model Averaging
- Multifold Cross-Validation Model Averaging for Generalized Additive Partial Linear Models
- Subsampling from features in large regression to find ``winning features
- Consistency of BIC model averaging
- Visualization and assessment of model selection uncertainty
- Predicting 5G throughput with BAMMO, a boosted additive model for data with missing observations
- Variable importance based interaction modelling with an application on initial spread of COVID-19 in China
- Cross-validation with confidence
- Information criteria for model selection
- Model averaging for semiparametric varying coefficient quantile regression models
- The scalable birth-death MCMC algorithm for mixed graphical model learning with application to genomic data integration
- A comparative study on high-dimensional bayesian regression with binary predictors
- MoST: model specification test by variable selection stability
- On improvability of model selection by model averaging
- Disagreement based variable selection method for high-dimensional censored data
- Performance Assessment of High-dimensional Variable Identification
- Corrected Mallows criterion for model averaging
- Nonparametric variable importance assessment using machine learning techniques
This page was built for publication: Sparsity oriented importance learning for high-dimensional linear regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3121571)