Linear unlearning for cross-validation (Q1923892)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Linear unlearning for cross-validation
scientific article

    Statements

    Linear unlearning for cross-validation (English)
    0 references
    0 references
    0 references
    0 references
    2 September 1997
    0 references
    Consider nonlinear regression in which the output \(y\) is regressed nonlinearly on the input vector \(x\). The authors focus on a neural network implementation, in which the output is predicted by \(\widehat y=F(x,w)\), where \(F(\cdot)\) denotes the nonlinear mapping of the neural net and \(w\) is the vector of network parameters. The conditional input-output distribution, i.e., the probability distribution of the output conditioned on a test input, is a basic objective for neural net modeling. A main source of uncertainty, when estimating the parameters of the conditional distribution, is the random selection of training data. The idea of cross-validation in neural net learning is based on training and testing on disjunct subsets resampled from the database, forming the cross-validation ensemble of models. The leave-one-out (LOO) ensemble of networks trained on all subsets leaving out one training example is an attractive -- though computationally expensive -- vehicle for generalization assessment of a neural network model. For the conventional neural net approaches unlearning of examples is not possible, and one basically has to train the full ensemble of networks, making the approach computationally unfeasible. This paper suggests the use of linear unlearning of examples to approximate the computationally expensive LOO cross-validation technique. It is assumed that unlearning of a single example only affects the network weights slightly. Under this hypothesis the change in the network parameters within the quadratic approximation of the network cost function is estimated. Using the ensemble an estimator for the test error of a regularized network is derived. The possibility of employing the ensemble of networks produced by the cross-validation scheme for constructing an ensemble predictor is analyzed. Considering a linear combination of networks, it is shown that the generalization performance is identical to that of using a single network trained on the full set of data. Numerical studies on the sunspot time series prediction benchmark demonstrates the viability of this approach.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    nonlinear regression
    0 references
    neural network
    0 references
    input-output distribution
    0 references
    cross-validation
    0 references
    linear unlearning
    0 references
    quadratic approximation
    0 references
    time series prediction
    0 references