To explain or to predict?
From MaRDI portal
Publication:906529
DOI10.1214/10-STS330zbMATH Open1329.62045arXiv1101.0891OpenAlexW3121452939WikidataQ29030663 ScholiaQ29030663MaRDI QIDQ906529FDOQ906529
Authors: Galit Shmueli
Publication date: 22 January 2016
Published in: Statistical Science (Search for Journal in Brave)
Abstract: Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process.
Full work available at URL: https://arxiv.org/abs/1101.0891
Recommendations
data miningcausalitypredictive modelingpredictive powerexplanatory modelingscientific researchstatistical strategy
Cites Work
- Causation, prediction, and search
- Heuristics of instability and stabilization in model selection
- Bayesian data analysis.
- Random forests
- Title not available (Why is that?)
- An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias
- Statistical modeling: The two cultures. (With comments and a rejoinder).
- Introduction to linear regression analysis.
- Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach
- The central role of the propensity score in observational studies for causal effects
- Specification Tests in Econometrics
- Boosting algorithms: regularization, prediction and model fitting
- Title not available (Why is that?)
- The Predictive Sample Reuse Method with Applications
- Title not available (Why is that?)
- Handling missing values when applying classification models
- Title not available (Why is that?)
- Common risk factors in the returns on stocks and bonds
- An investigation of missing data methods for classification trees applied to binary response data
- Title not available (Why is that?)
- Title not available (Why is that?)
- Bayes Model Averaging with Selection of Regressors
- Investigating Causal Relations by Econometric Models and Cross-spectral Methods
- Decomposition of Prediction Error in Multilevel Models
- Statistical learning from a regression perspective
- Causal diagrams for empirical research
- Information criteria and statistical modeling.
- Title not available (Why is that?)
- Simplicity, Inference and Modelling
- Predictive likelihood: A review. With comments and a rejoinder by the author
- How to Tell When Simpler, More Unified, or LessAd HocTheories will Provide More Accurate Predictions
- Bayes not Bust! Why Simplicity is no Problem for Bayesians
- Methods and Criteria for Model Selection
- A Predictive View of the Detection and Characterization of Influential Observations in Regression Analysis
- Probability weights in rank-dependent utility with binary even-chance independence.
- Not even wrong. The failure of string theory and the search for unity in physical law.
- The Statistical Research Group, 1942-1945
- Modeling online auctions.
- Scientific method, statistical method and the speed of light.
- Interaction effects in logistic regression
- A conversation with Hirotugu Akaike
Cited In (79)
- Explainable AI for operational research: a defining framework, methods, applications, and a research agenda
- Using cross-validation methods to select time series models: promises and pitfalls
- Forbidden Knowledge and Specialized Training: A Versatile Solution for the Two Main Sources of Overfitting in Linear Regression
- Bayesian approaches to variable selection: a comparative study from practical perspectives
- Estimating retail demand with Poisson mixtures and out-of-sample likelihood
- A special issue on: Actual impact and future perspectives on stochastic modelling in business and industry
- An evolutionary estimation procedure for generalized semilinear regression trees
- Differential equations in data analysis
- SUBiNN: a stacked uni- and bivariate \(k\)NN sparse ensemble
- A zero-inflated endemic-epidemic model with an application to measles time series in Germany
- Selection of variables for multivariable models: opportunities and limitations in quantifying model stability by resampling
- Information criteria for model selection
- Some models are useful, but how do we know which ones? Towards a unified Bayesian model taxonomy
- A Survey of Differentially Private Regression for Clinical and Epidemiological Research
- Statistical plasmode simulations-potentials, challenges and recommendations
- Conditional intensity: A powerful tool for modelling and analysing point process data
- A comparison of full model specification and backward elimination of potential confounders when estimating marginal and conditional causal effects on binary outcomes from observational data
- A neutral comparison of algorithms to minimize \(L_0\) penalties for high-dimensional variable selection
- Predicting class switch recombination in B-cells from antibody repertoire data
- Propensity-based standardization to enhance the validation and interpretation of prediction model discrimination for a target population
- Confidence, prediction, and tolerance in linear mixed models
- Classification model with weighted regularization to improve the reproducibility of neuroimaging signature selection
- Post-estimation shrinkage in full and selected linear regression models in low-dimensional data revisited
- Analytical Problem Solving Based on Causal, Correlational and Deductive Models
- The InterModel Vigorish as a Lens for understanding (and quantifying) the value of item response models for dichotomously coded items
- Explainable ensemble trees
- Flexible model-based non-negative matrix factorization with application to mutational signatures
- Particle swarm optimization based ridge logistic estimator
- Robust multivariate functional discriminant coordinates
- Measuring the Stability of Results From Supervised Statistical Learning
- On the exploration of regression dependence structures in multidimensional contingency tables with ordinal response variables
- Handling co-dependence issues in resampling-based variable selection procedures: a simulation study
- The wisdom of crowds and transfer market values
- Models for understanding versus models for prediction
- Robust estimation in canonical correlation analysis for multivariate functional data
- The growing ubiquity of algorithms in society: implications, impacts and innovations
- What makes a VRP solution good? The generation of problem-specific knowledge for heuristics
- Variable selection in time series forecasting using random forests
- Learning certifiably optimal rule lists for categorical data
- Controlling the error probabilities of model selection information criteria using bootstrapping
- Title not available (Why is that?)
- ‘The COM‐Poisson model for count data: a survey of methods and applications’ by K. Sellers, S. Borle and G. Shmueli
- Selected statistical methods of data analysis for multivariate functional data
- On stability issues in deriving multivariable regression models
- Variable selection -- a review and recommendations for the practicing statistician
- On exploratory analytic method for multi-way contingency tables with an ordinal response variable and categorical explanatory variables
- Distributional regression for demand forecasting in e-grocery
- A novel bagging approach for variable ranking and selection via a mixed importance measure
- Prediction of the Nash through penalized mixture of logistic regression models
- Bayesian hierarchical rule modeling for predicting medical conditions
- The balance property in neural network modelling
- The heteroscedastic graded response model with a skewed latent trait: testing statistical and substantive hypotheses related to skewed item category functions
- Quantifying simulator discrepancy in discrete-time dynamical simulators
- A Tale of Two Matrix Factorizations
- Machine learning versus statistical modeling
- Rejoinder to: Probability estimation with machine learning methods for dichotomous and multicategory outcome
- Pitfalls and merits of cointegration-based mortality models
- Omitted variable bias in GLMs of neural spiking activity
- PBoostGA: pseudo-boosting genetic algorithm for variable ranking and selection
- A novel completeness test for leakage models and its application to side channel attacks and responsibly engineered simulators
- A random forest based approach for predicting spreads in the primary catastrophe bond market
- Parameter identifiability and model selection for sigmoid population growth models
- RandGA: injecting randomness into parallel genetic algorithm for variable selection
- An endemic–epidemic beta model for time series of infectious disease proportions
- The Need for More Emphasis on Prediction: A “Nondenominational” Model-Based Approach
- Monitoring systemic risk in the hedge fund sector
- Variable Selection With Second-Generation P-Values
- Sequential event prediction
- Title not available (Why is that?)
- An empirical comparison of popular structure learning algorithms with a view to gene network inference
- Explanation, prediction, description, and information theory
- Detailed study of a moving average trading rule
- Comment on: ``Models as approximations
- Multivariate analysis of variance for functional data
- Discriminant coordinates analysis for multivariate functional data
- Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model
- Methods to compute prediction intervals: a review and new results
- Games with second-order expected utility
- A Bayesian perspective of statistical machine learning for big data
Uses Software
This page was built for publication: To explain or to predict?
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q906529)