Just interpolate: kernel ``ridgeless regression can generalize
From MaRDI portal
Publication:2196223
Abstract: In the absence of explicit regularization, Kernel "Ridgeless" Regression with nonlinear kernels has the potential to fit the training data perfectly. It has been observed empirically, however, that such interpolated solutions can still generalize well on test data. We isolate a phenomenon of implicit regularization for minimum-norm interpolated solutions which is due to a combination of high dimensionality of the input data, curvature of the kernel function, and favorable geometric properties of the data such as an eigenvalue decay of the empirical covariance and kernel matrices. In addition to deriving a data-dependent upper bound on the out-of-sample error, we present experimental evidence suggesting that the phenomenon occurs in the MNIST dataset.
Recommendations
- Generalization error of minimum weighted norm and kernel interpolation
- Surprises in high-dimensional ridgeless least squares interpolation
- Benign overfitting in linear regression
- Benefit of Interpolation in Nearest Neighbor Algorithms
- Overparameterization and generalization error: weighted trigonometric interpolation
Cites work
- scientific article; zbMATH DE number 45848 (Why is no real title available?)
- scientific article; zbMATH DE number 1332320 (Why is no real title available?)
- scientific article; zbMATH DE number 1950576 (Why is no real title available?)
- 10.1162/153244303321897690
- A distribution-free theory of nonparametric regression
- An introduction to support vector machines and other kernel-based learning methods.
- Best choices for regularization parameters in learning theory: on the bias-variance problem.
- Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter
- Kernel ridge regression
- Kernels for vector-valued functions: a review
- Learning Theory
- Model selection for regularized least-squares algorithm in learning theory
- On early stopping in gradient descent learning
- On the limit of the largest eigenvalue of the large dimensional sample covariance matrix
- Optimal rates for the regularized least-squares algorithm
- Regularization networks and support vector machines
- Scikit-learn: machine learning in Python
- The origins of kriging
- The spectrum of kernel random matrices
Cited in
(50)- On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions
- The interpolation phase transition in neural networks: memorization and generalization under lazy training
- Communication-efficient distributed estimator for generalized linear models with a diverging number of covariates
- Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration
- A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent*
- A Unifying Tutorial on Approximate Message Passing
- Canonical thresholding for nonsparse high-dimensional linear regression
- Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits
- A sieve stochastic gradient descent estimator for online nonparametric regression in Sobolev ellipsoids
- Multilevel Fine-Tuning: Closing Generalization Gaps in Approximation of Solution Maps under a Limited Budget for Training Data
- Benign overfitting and adaptive nonparametric regression
- Learning the mapping \(\mathbf{x}\mapsto \sum\limits_{i=1}^d x_i^2\): the cost of finding the needle in a haystack
- scientific article; zbMATH DE number 7626719 (Why is no real title available?)
- Binary classification of Gaussian mixtures: abundance of support vectors, benign overfitting, and regularization
- Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks
- Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
- Theoretical issues in deep networks
- For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability
- HARFE: hard-ridge random feature expansion
- Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks
- Convergence analysis for over-parameterized deep learning
- scientific article; zbMATH DE number 7306870 (Why is no real title available?)
- Overparameterization and generalization error: weighted trigonometric interpolation
- A multi-resolution theory for approximating infinite-\(p\)-zero-\(n\): transitional inference, individualized predictions, and a world without bias-variance tradeoff
- Benign Overfitting and Noisy Features
- Learning ability of interpolating deep convolutional neural networks
- New equivalences between interpolation and SVMs: kernels and structured features
- Diversity sampling is an implicit regularization for kernel methods
- Kernel approximation: from regression to interpolation
- scientific article; zbMATH DE number 7370646 (Why is no real title available?)
- Tractability from overparametrization: the example of the negative perceptron
- Improved complexities for stochastic conditional gradient methods under interpolation-like conditions
- Benign overfitting in linear regression
- A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors
- On the proliferation of support vectors in high dimensions*
- Learning from non-random data in Hilbert spaces: an optimal recovery perspective
- SVRG meets AdaGrad: painless variance reduction
- Deep networks for system identification: a survey
- Generalization error of minimum weighted norm and kernel interpolation
- Deep learning: a statistical viewpoint
- Deep neural networks, generic universal interpolation, and controlled ODEs
- Generalization error rates in kernel regression: the crossover from the noiseless to noisy regime*
- Locality defeats the curse of dimensionality in convolutional teacher–student scenarios*
- scientific article; zbMATH DE number 7625163 (Why is no real title available?)
- scientific article; zbMATH DE number 7415102 (Why is no real title available?)
- scientific article; zbMATH DE number 7415098 (Why is no real title available?)
- Surprises in high-dimensional ridgeless least squares interpolation
- On the robustness of minimum norm interpolators and regularized empirical risk minimizers
- Linearized two-layers neural networks in high dimension
- A precise high-dimensional asymptotic theory for boosting and minimum-\(\ell_1\)-norm interpolated classifiers
This page was built for publication: Just interpolate: kernel ``ridgeless regression can generalize
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2196223)