Does data splitting improve prediction?

DOI10.1007/S11222-014-9522-9zbMATH Open1342.62025arXiv1301.2983OpenAlexW3125270795MaRDI QIDQ2631345FDOQ2631345

Publication date: 29 July 2016

Published in: Statistics and Computing (Search for Journal in Brave)

Abstract: Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the problem of constructing reliable predictive distributions for future observed values. We judge the predictive performance using log scoring. We compare the full data strategy with the data splitting strategy for prediction. We show how the full data score can be decomposed into model selection, parameter estimation and data reuse costs. Data splitting is preferred when data reuse costs are high. We investigate the relative performance of the strategies in four simulation scenarios. We introduce a hybrid estimator called SAFE that uses one part for model selection but both parts for estimation. We discuss the choice to use a split data analysis versus a full data analysis.

Full work available at URL: https://arxiv.org/abs/1301.2983

Recommendations

zbMATH Keywords

prediction cross-validation model assessment model uncertainty scoring model validation

Mathematics Subject Classification ID

Point estimation (62F10) General considerations in statistical decision theory (62C05)

Cites Work

Cited In (2)

Uses Software

This page was built for publication: Does data splitting improve prediction?

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2631345)