Does data splitting improve prediction?
From MaRDI portal
Publication:2631345
Abstract: Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the problem of constructing reliable predictive distributions for future observed values. We judge the predictive performance using log scoring. We compare the full data strategy with the data splitting strategy for prediction. We show how the full data score can be decomposed into model selection, parameter estimation and data reuse costs. Data splitting is preferred when data reuse costs are high. We investigate the relative performance of the strategies in four simulation scenarios. We introduce a hybrid estimator called SAFE that uses one part for model selection but both parts for estimation. We discuss the choice to use a split data analysis versus a full data analysis.
Recommendations
- Data partition methodology for validation of predictive models
- How much better is disaggregate data?
- Forecasting with many predictors: is boosting a viable alternative?
- Multi split conformal prediction
- The effect of splitting on random forests
- Does adding data always improve linear regression estimates?
Cites work
- scientific article; zbMATH DE number 46873 (Why is no real title available?)
- scientific article; zbMATH DE number 3483405 (Why is no real title available?)
- scientific article; zbMATH DE number 720675 (Why is no real title available?)
- A note on data-splitting for the evaluation of significance levels
- Clinical prediction models. A practical approach to development, validation, and updating.
- Confidence distribution, the frequentist distribution estimator of a parameter: a review
- Cross-Validation of Regression Models
- Frequentist prediction intervals and predictive distributions
- Least squares after model selection in high-dimensional sparse models
- MODEL SELECTION AND INFERENCE: FACTS AND FICTION
- Present Position and Potential Developments: Some Personal Views: Statistical Theory: The Prequential Approach
- Proper local scoring rules
- Split Samples and Design Sensitivity in Observational Studies
- Statistical Analysis of Financial Data in S-Plus
- Strictly Proper Scoring Rules, Prediction, and Estimation
- The Analysis of Transformed Data
- The elements of statistical learning. Data mining, inference, and prediction
Cited in
(2)
This page was built for publication: Does data splitting improve prediction?
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2631345)