Prediction error after model search

DOI10.1214/19-AOS1818zbMATH Open1448.62137arXiv1610.06107OpenAlexW3030749005MaRDI QIDQ2196193FDOQ2196193

Publication date: 28 August 2020

Published in: The Annals of Statistics (Search for Journal in Brave)

Abstract: Estimation of the prediction error of a linear estimation rule is difficult if the data analyst also use data to select a set of variables and construct the estimation rule using only the selected variables. In this work, we propose an asymptotically unbiased estimator for the prediction error after model search. Under some additional mild assumptions, we show that our estimator converges to the true prediction error in

L^{2}

at the rate of

O (n^{- 1 / 2})

, with

n

being the number of data points. Our estimator applies to general selection procedures, not requiring analytical forms for the selection. The number of variables to select from can grow as an exponential factor of

n

, allowing applications in high-dimensional data. It also allows model misspecifications, not requiring linear underlying models. One application of our method is that it provides an estimator for the degrees of freedom for many discontinuous estimation rules like best subset selection or relaxed Lasso. Connection to Stein's Unbiased Risk Estimator is discussed. We consider in-sample prediction errors in this work, with some extension to out-of-sample errors in low dimensional, linear models. Examples such as best subset selection and relaxed Lasso are considered in simulations, where our estimator outperforms both

C_{p}

and cross validation in various settings.

Full work available at URL: https://arxiv.org/abs/1610.06107

Recommendations

zbMATH Keywords

prediction error degrees of freedom model search Stein's unbiased risk estimator (SURE)

Mathematics Subject Classification ID

Asymptotic properties of parametric estimators (62F12) Statistical ranking and selection procedures (62F07) Estimation in multivariate analysis (62H12) Ridge regression; shrinkage estimators (Lasso) (62J07) Inference from stochastic processes and prediction (62M20)

Cites Work

Cited In (4)

Uses Software

selectiveInference

This page was built for publication: Prediction error after model search

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2196193)