Linear unlearning for cross-validation (Q1923892)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Linear unlearning for cross-validation |
scientific article |
Statements
Linear unlearning for cross-validation (English)
0 references
2 September 1997
0 references
Consider nonlinear regression in which the output \(y\) is regressed nonlinearly on the input vector \(x\). The authors focus on a neural network implementation, in which the output is predicted by \(\widehat y=F(x,w)\), where \(F(\cdot)\) denotes the nonlinear mapping of the neural net and \(w\) is the vector of network parameters. The conditional input-output distribution, i.e., the probability distribution of the output conditioned on a test input, is a basic objective for neural net modeling. A main source of uncertainty, when estimating the parameters of the conditional distribution, is the random selection of training data. The idea of cross-validation in neural net learning is based on training and testing on disjunct subsets resampled from the database, forming the cross-validation ensemble of models. The leave-one-out (LOO) ensemble of networks trained on all subsets leaving out one training example is an attractive -- though computationally expensive -- vehicle for generalization assessment of a neural network model. For the conventional neural net approaches unlearning of examples is not possible, and one basically has to train the full ensemble of networks, making the approach computationally unfeasible. This paper suggests the use of linear unlearning of examples to approximate the computationally expensive LOO cross-validation technique. It is assumed that unlearning of a single example only affects the network weights slightly. Under this hypothesis the change in the network parameters within the quadratic approximation of the network cost function is estimated. Using the ensemble an estimator for the test error of a regularized network is derived. The possibility of employing the ensemble of networks produced by the cross-validation scheme for constructing an ensemble predictor is analyzed. Considering a linear combination of networks, it is shown that the generalization performance is identical to that of using a single network trained on the full set of data. Numerical studies on the sunspot time series prediction benchmark demonstrates the viability of this approach.
0 references
nonlinear regression
0 references
neural network
0 references
input-output distribution
0 references
cross-validation
0 references
linear unlearning
0 references
quadratic approximation
0 references
time series prediction
0 references