Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery
From MaRDI portal
(Redirected from Publication:766007)
Abstract: Cross-validation (CV) is widely used for tuning a model with respect to user-selected parameters and for selecting a "best" model. For example, the method of -nearest neighbors requires the user to choose , the number of neighbors, and a neural network has several tuning parameters controlling the network complexity. Once such parameters are optimized for a particular data set, the next step is often to compare the various optimized models and choose the method with the best predictive performance. Both tuning and model selection boil down to comparing models, either across different values of the tuning parameters or across different classes of statistical models and/or sets of explanatory variables. For multiple large sets of data, like the PubChem drug discovery cheminformatics data which motivated this work, reliable CV comparisons are computationally demanding, or even infeasible. In this paper we develop an efficient sequential methodology for model comparison based on CV. It also takes into account the randomness in CV. The number of models is reduced via an adaptive, multiplicity-adjusted sequential algorithm, where poor performers are quickly eliminated. By exploiting matching of individual observations, it is sometimes even possible to establish the statistically significant inferiority of some models with just one execution of CV.
Recommendations
Cites work
- scientific article; zbMATH DE number 3483405 (Why is no real title available?)
- scientific article; zbMATH DE number 1228067 (Why is no real title available?)
- scientific article; zbMATH DE number 835699 (Why is no real title available?)
- scientific article; zbMATH DE number 5056254 (Why is no real title available?)
- scientific article; zbMATH DE number 3076948 (Why is no real title available?)
- A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods
- Asymptotic optimality for \(C_ p\), \(C_ L\), cross-validation and generalized cross-validation: Discrete index set
- Asymptotics for and against cross-validation
- Asymptotics of cross-validated risk estimation in estimator selection and performance assess\-ment
- Deletion/Substitution/Addition Algorithm in Learning with Applications in Genomics
- Design and Analysis of Experiments
- Linear Model Selection by Cross-Validation
- Model selection via multifold cross validation
This page was built for publication: Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q766007)