Abstract: It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid ``post-selection inference by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention intervals. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing ``simultaneity insurance for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always less conservative than full Scheffe protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results.
Recommendations
- Exact post-selection inference, with application to the Lasso
- Valid post-selection inference in model-free linear regression
- MODEL SELECTION AND INFERENCE: FACTS AND FICTION
- Selective inference after likelihood- or test-based model selection in linear models
- Conditional predictive inference post model selection
Cites work
- scientific article; zbMATH DE number 3141625 (Why is no real title available?)
- scientific article; zbMATH DE number 4100415 (Why is no real title available?)
- scientific article; zbMATH DE number 46309 (Why is no real title available?)
- A Note on Quantiles in Large Samples
- Asymptotic properties of maximum likelihood estimators based on conditional specification
- CAN ONE ESTIMATE THE UNCONDITIONAL DISTRIBUTION OF POST-MODEL-SELECTION ESTIMATORS?
- Can one estimate the conditional distribution of post-model-selection estimators?
- Confidence sets based on penalized maximum likelihood estimators in Gaussian regression
- Distributional results for thresholding estimators in high-dimensional Gaussian regression models
- Frequentist Model Average Estimators
- MODEL SELECTION AND INFERENCE: FACTS AND FICTION
- Mostly harmless econometrics. An empiricist's companion.
- Note on a Conditional Property of Student's $t^1$
- On model uncertainty and its statistical implications. Proceedings of a workshop, held in Groningen, Netherlands, September 25-26, 1986
- On preliminary test and shrinkage M-estimation in linear models
- On the Large-Sample Minimal Coverage Probability of Confidence Intervals After Model Selection
- On the distribution of penalized maximum likelihood estimators: the LASSO, SCAD, and thresholding
- On the distribution of the adaptive LASSO estimator
- PERFORMANCE LIMITS FOR ESTIMATORS OF THE RISK OR DISTRIBUTION OF SHRINKAGE-TYPE ESTIMATORS, AND SOME GENERAL LOWER RISK-BOUND RESULTS
- Random Packings and Coverings of the Unit n-Sphere
- Sparse estimators and the oracle property, or the return of Hodges' estimator
- THE FINITE-SAMPLE DISTRIBUTION OF POST-MODEL-SELECTION ESTIMATORS AND UNIFORM VERSUS NONUNIFORM APPROXIMATIONS
- The Conditional Level of Student's $t$ Test
- The Conditional Level of the F-Test
- The Focused Information Criterion
- The distribution of a linear predictor after model selection: unconditional finite-sample distributions and asymptotic approximations
- The distribution of model averaging estimators and an impossibility result regarding its estima\-tion
- Valid post-selection inference
Cited in
(only showing first 100 items - show all)- In defense of the indefensible: a very naïve approach to high-dimensional inference
- Least-Square Approximation for a Distributed System
- Bayesian semiparametric functional mixed models for serially correlated functional data, with application to glaucoma data
- Exploratory inference: localizing relevant effects with confidence
- On asymptotically optimal confidence regions and tests for high-dimensional models
- Log-linear Bayesian additive regression trees for multinomial logistic and count regression models
- Bounds in \(L^1\) Wasserstein distance on the normal approximation of general M-estimators
- Regularized projection score estimation of treatment effects in high-dimensional quantile regression
- Constraints versus priors
- Optimal finite sample post-selection confidence distributions in generalized linear models
- A nonparametric sequential learning procedure for estimating the pure premium
- The robust desparsified lasso and the focused information criterion for high-dimensional generalized linear models
- Mixed-effect models with trees
- Lasso Inference for High-Dimensional Time Series
- Post-model-selection prediction intervals for generalized linear models
- Discovery and Inference of a Causal Network with Hidden Confounding
- Exact post-selection inference, with application to the Lasso
- Asymptotically honest confidence regions for high dimensional parameters by the desparsified conservative Lasso
- Bayesian Inference Is Unaffected by Selection: Fact or Fiction?
- Assumption Lean Regression
- Selective inference after feature selection via multiscale bootstrap
- Valid post-selection inference
- Rates of convergence of the adaptive LASSO estimators to the oracle distribution and higher order refinements by the bootstrap
- scientific article; zbMATH DE number 7626707 (Why is no real title available?)
- Rejoinder on: ``Hierarchical inference for genome-wide association studies: a view on methodology with software
- A simulation based method for assessing the statistical significance of logistic regression models after common variable selection procedures
- Weighted-average least squares estimation of generalized linear models
- Informative goodness-of-fit for multivariate distributions
- Confidence intervals for parameters in high-dimensional sparse vector autoregression
- Some perspectives on inference in high dimensions
- Confidence sets based on thresholding estimators in high-dimensional Gaussian regression models
- Rejoinder on: ``High-dimensional simultaneous inference with the bootstrap
- Likelihood ratio test in multivariate linear regression: from low to high dimension
- Principled statistical inference in data science
- Prediction intervals with controlled length in the heteroscedastic Gaussian regression
- Robust inference on average treatment effects with possibly more covariates than observations
- MODEL SELECTION AND INFERENCE: FACTS AND FICTION
- Models as approximations. I. Consequences illustrated with linear regression
- An automated approach towards sparse single-equation cointegration modelling
- A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach
- Filtering the Rejection Set While Preserving False Discovery Rate Control
- Scalable and efficient inference via CPE
- Spatially relaxed inference on high-dimensional linear models
- Neighborhood-based cross fitting approach to treatment effects with high-dimensional data
- Inference for \(L_2\)-boosting
- Larry Brown's contributions to parametric inference, decision theory and foundations: a survey
- Statistical theory powering data science
- Conditional selective inference for robust regression and outlier detection using piecewise-linear homotopy continuation
- Controlling the false discovery rate via knockoffs
- Spatial variable selection and an application to Virginia Lyme disease emergence
- Robust Q-learning
- Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science
- Selective inference via marginal screening for high dimensional classification
- A (tight) upper bound for the length of confidence intervals with conditional coverage
- Forward stability and model path selection
- Selection-corrected statistical inference for region detection with high-throughput assays
- Statistical proof? The problem of irreproducibility
- Frequentist model averaging in structural equation modelling
- A bootstrap Lasso+partial ridge method to construct confidence intervals for parameters in high-dimensional sparse linear models
- Kernel Ordinary Differential Equations
- SLOPE-adaptive variable selection via convex optimization
- On various confidence intervals post-model-selection
- High-dimensional inference: confidence intervals, \(p\)-values and R-software \texttt{hdi}
- Heterogeneous heterogeneity by default: Testing categorical moderators in mixed‐effects meta‐analysis
- Inference for low‐ and high‐dimensional inhomogeneous Gibbs point processes
- Inference After Model Selection
- False Discovery Rate Control via Data Splitting
- Post-selection inference of generalized linear models based on the lasso and the elastic net
- Forward-selected panel data approach for program evaluation
- Selection of mixed copula for association modeling with tied observations
- Exploration of the variability of variable selection based on distances between bootstrap sample results
- Confidently Comparing Estimates with the c-value
- Selective inference for latent block models
- On Hodges' superefficiency and merits of oracle property in model selection
- Simultaneous high-probability bounds on the false discovery proportion in structured, regression and online settings
- Distribution-free predictive inference for regression
- Distributionally robust and generalizable inference
- FANOK: knockoffs in linear time
- Uniformly valid confidence intervals post-model-selection
- Markov Neighborhood Regression for High-Dimensional Inference
- The costs and benefits of uniformly valid causal inference with high-dimensional nuisance parameters
- Asymptotics of selective inference
- Post hoc confidence bounds on false positives using reference families
- An evolutionary estimation procedure for generalized semilinear regression trees
- Penalized estimation of a class of single‐index varying‐coefficient models for integrative genomic analysis
- Bootstrapping and sample splitting for high-dimensional, assumption-lean inference
- Penalized likelihood and multiple testing
- Valid post-selection inference in high-dimensional approximately sparse quantile regression models
- A knockoff filter for high-dimensional selective inference
- Projection-based Inference for High-dimensional Linear Models
- scientific article; zbMATH DE number 7750673 (Why is no real title available?)
- Exact post-selection inference for the generalized Lasso path
- Inferactive data analysis
- Inference for High-Dimensional Censored Quantile Regression
- Exact post-selection inference for adjusted R squared selection
- Optimal configurations of lines and a statistical application
- scientific article; zbMATH DE number 7750675 (Why is no real title available?)
- On the post selection inference constant under restricted isometry properties
- Optimal model averaging for divergent-dimensional Poisson regressions
- The Perils of Balance Testing in Experimental Design: Messy Analyses of Clean Data
This page was built for publication: Valid post-selection inference
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q355109)