Understanding complex predictive models with ghost variables
From MaRDI portal
Abstract: We propose a procedure for assigning a relevance measure to each explanatory variable in a complex predictive model. We assume that we have a training set to fit the model and a test set to check the out of sample performance. First, the individual relevance of each variable is computed by comparing the predictions in the test set, given by the model that includes all the variables with those of another model in which the variable of interest is substituted by its ghost variable, defined as the prediction of this variable by using the rest of explanatory variables. Second, we check the joint effects among the variables by using the eigenvalues of a relevance matrix that is the covariance matrix of the vectors of individual effects. It is shown that in simple models, as linear or additive models, the proposed measures are related to standard measures of significance of the variables and in neural networks models (and in other algorithmic prediction models) the procedure provides information about the joint and individual effects of the variables that is not usually available by other methods. The procedure is illustrated with simulated examples and the analysis of a large real data set.
Recommendations
- All models are wrong, but many are useful: learning a variable's importance by studying an entire class of prediction models simultaneously
- Nonparametric variable importance assessment using machine learning techniques
- A note on the interpretation of tree-based regression models
- Local interpretation of supervised learning models based on high dimensional model representation
- Framework for making better predictions by directly estimating variables' predictivity
Cites work
- scientific article; zbMATH DE number 720678 (Why is no real title available?)
- scientific article; zbMATH DE number 1941665 (Why is no real title available?)
- scientific article; zbMATH DE number 6734253 (Why is no real title available?)
- Controlling the false discovery rate via knockoffs
- Correlation and variable importance in random forests
- Distribution-free predictive inference for regression
- Grouped variable importance with random forests and application to multiple functional data analysis
- Panning for Gold: ‘Model-X’ Knockoffs for High Dimensional Controlled Variable Selection
- Reinforcement learning trees
- Scikit-learn: machine learning in Python
- Statistical Analysis of Financial Data in S-Plus
- Statistical modeling: The two cultures. (With comments and a rejoinder).
- The Holdout Randomization Test for Feature Selection in Black Box Models
- Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance
Cited in
(3)- A simple approach for local and global variable importance in nonlinear regression models
- Enhancing the interpretability of nonlinear proportional hazard models introducing ghost variables
- All models are wrong, but many are useful: learning a variable's importance by studying an entire class of prediction models simultaneously
This page was built for publication: Understanding complex predictive models with ghost variables
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6114845)