Stabilizing variable selection and regression
From MaRDI portal
Abstract: We consider regression in which one predicts a response with a set of predictors across different experiments or environments. This is a common setup in many data-driven scientific fields and we argue that statistical inference can benefit from an analysis that takes into account the distributional changes across environments. In particular, it is useful to distinguish between stable and unstable predictors, i.e., predictors which have a fixed or a changing functional dependence on the response, respectively. We introduce stabilized regression which explicitly enforces stability and thus improves generalization performance to previously unseen environments. Our work is motivated by an application in systems biology. Using multiomic data, we demonstrate how hypothesis generation about gene function can benefit from stabilized regression. We believe that a similar line of arguments for exploiting heterogeneity in data can be powerful for many other applications as well. We draw a theoretical connection between multi-environment regression and causal models, which allows to graphically characterize stable versus unstable functional dependence on the response. Formally, we introduce the notion of a stable blanket which is a subset of the predictors that lies between the direct causal predictors and the Markov blanket. We prove that this set is optimal in the sense that a regression based on these predictors minimizes the mean squared prediction error given that the resulting regression generalizes to unseen new environments.
Recommendations
Cites work
- scientific article; zbMATH DE number 1247156 (Why is no real title available?)
- scientific article; zbMATH DE number 6982327 (Why is no real title available?)
- scientific article; zbMATH DE number 845714 (Why is no real title available?)
- Anchor Regression: Heterogeneous Data Meet Causality
- Bagging predictors
- Bayesian model averaging: A tutorial. (with comments and a rejoinder).
- Causal inference by using invariant prediction: identification and confidence intervals. With discussion and authors' reply
- Causal inference for statistics, social, and biomedical sciences. An introduction
- Causality. Models, reasoning, and inference
- Conditional variance penalties and domain shift robustness
- Domain-adversarial training of neural networks
- Elements of causal inference. Foundations and learning algorithms
- Estimating the dimension of a model
- Extended conditional independence and applications in causal inference
- Goodness-of-Fit Tests for High Dimensional Linear Models
- Influence Diagrams for Causal Modelling and Inference
- Invariance, causality and robustness
- Invariant causal prediction for sequential data
- Learning stable and predictive structures in kinetic systems
- Random forests
- Random lasso
- Random-projection ensemble classification. (With discussion).
- Semi-supervised learning in causal and anticausal settings
- Stability
- Stability Selection
- Sure independence screening for ultrahigh dimensional feature space. With discussion and authors' reply
- Tests of Equality Between Sets of Coefficients in Two Linear Regressions
- The Probability Approach in Econometrics
- Using Markov blankets for causal structure learning
- Veridical data science
Cited in
(4)
This page was built for publication: Stabilizing variable selection and regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q104197)