Just interpolate: kernel ``ridgeless regression can generalize
From MaRDI portal
Publication:2196223
Abstract: In the absence of explicit regularization, Kernel "Ridgeless" Regression with nonlinear kernels has the potential to fit the training data perfectly. It has been observed empirically, however, that such interpolated solutions can still generalize well on test data. We isolate a phenomenon of implicit regularization for minimum-norm interpolated solutions which is due to a combination of high dimensionality of the input data, curvature of the kernel function, and favorable geometric properties of the data such as an eigenvalue decay of the empirical covariance and kernel matrices. In addition to deriving a data-dependent upper bound on the out-of-sample error, we present experimental evidence suggesting that the phenomenon occurs in the MNIST dataset.
Recommendations
- Generalization error of minimum weighted norm and kernel interpolation
- Surprises in high-dimensional ridgeless least squares interpolation
- Benign overfitting in linear regression
- Benefit of Interpolation in Nearest Neighbor Algorithms
- Overparameterization and generalization error: weighted trigonometric interpolation
Cites work
- scientific article; zbMATH DE number 45848 (Why is no real title available?)
- scientific article; zbMATH DE number 1332320 (Why is no real title available?)
- scientific article; zbMATH DE number 1950576 (Why is no real title available?)
- 10.1162/153244303321897690
- A distribution-free theory of nonparametric regression
- An introduction to support vector machines and other kernel-based learning methods.
- Best choices for regularization parameters in learning theory: on the bias-variance problem.
- Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter
- Kernel ridge regression
- Kernels for vector-valued functions: a review
- Learning Theory
- Model selection for regularized least-squares algorithm in learning theory
- On early stopping in gradient descent learning
- On the limit of the largest eigenvalue of the large dimensional sample covariance matrix
- Optimal rates for the regularized least-squares algorithm
- Regularization networks and support vector machines
- Scikit-learn: machine learning in Python
- The origins of kriging
- The spectrum of kernel random matrices
Cited in
(50)- Learning from non-random data in Hilbert spaces: an optimal recovery perspective
- Linearized two-layers neural networks in high dimension
- Improved complexities for stochastic conditional gradient methods under interpolation-like conditions
- Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration
- Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks
- Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks
- Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits
- A Unifying Tutorial on Approximate Message Passing
- Benign Overfitting and Noisy Features
- Learning ability of interpolating deep convolutional neural networks
- Communication-efficient distributed estimator for generalized linear models with a diverging number of covariates
- Deep networks for system identification: a survey
- On the proliferation of support vectors in high dimensions*
- Canonical thresholding for nonsparse high-dimensional linear regression
- Learning the mapping \(\mathbf{x}\mapsto \sum\limits_{i=1}^d x_i^2\): the cost of finding the needle in a haystack
- Generalization error rates in kernel regression: the crossover from the noiseless to noisy regime*
- Locality defeats the curse of dimensionality in convolutional teacher–student scenarios*
- A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors
- For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability
- A sieve stochastic gradient descent estimator for online nonparametric regression in Sobolev ellipsoids
- Binary classification of Gaussian mixtures: abundance of support vectors, benign overfitting, and regularization
- Diversity sampling is an implicit regularization for kernel methods
- Convergence analysis for over-parameterized deep learning
- Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
- New equivalences between interpolation and SVMs: kernels and structured features
- Kernel approximation: from regression to interpolation
- A precise high-dimensional asymptotic theory for boosting and minimum-\(\ell_1\)-norm interpolated classifiers
- Deep learning: a statistical viewpoint
- A multi-resolution theory for approximating infinite-\(p\)-zero-\(n\): transitional inference, individualized predictions, and a world without bias-variance tradeoff
- Benign overfitting and adaptive nonparametric regression
- On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions
- Generalization error of minimum weighted norm and kernel interpolation
- Multilevel Fine-Tuning: Closing Generalization Gaps in Approximation of Solution Maps under a Limited Budget for Training Data
- scientific article; zbMATH DE number 7625163 (Why is no real title available?)
- scientific article; zbMATH DE number 7415102 (Why is no real title available?)
- Tractability from overparametrization: the example of the negative perceptron
- Surprises in high-dimensional ridgeless least squares interpolation
- scientific article; zbMATH DE number 7626719 (Why is no real title available?)
- A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent*
- Overparameterization and generalization error: weighted trigonometric interpolation
- Deep neural networks, generic universal interpolation, and controlled ODEs
- scientific article; zbMATH DE number 7415098 (Why is no real title available?)
- Benign overfitting in linear regression
- The interpolation phase transition in neural networks: memorization and generalization under lazy training
- Theoretical issues in deep networks
- scientific article; zbMATH DE number 7306870 (Why is no real title available?)
- SVRG meets AdaGrad: painless variance reduction
- HARFE: hard-ridge random feature expansion
- scientific article; zbMATH DE number 7370646 (Why is no real title available?)
- On the robustness of minimum norm interpolators and regularized empirical risk minimizers
This page was built for publication: Just interpolate: kernel ``ridgeless regression can generalize
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2196223)