Benign overfitting in linear regression
From MaRDI portal
Publication:5073215
DOI10.1073/PNAS.1907378117zbMath1485.62085arXiv1906.11300OpenAlexW3018252856WikidataQ93214520 ScholiaQ93214520MaRDI QIDQ5073215
Philip M. Long, Alexander Tsigler, Bartlett, Peter L.
Publication date: 5 May 2022
Published in: Proceedings of the National Academy of Sciences (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1906.11300
Linear regression; mixed models (62J05) Neural nets and related approaches to inference from stochastic processes (62M45)
Related Items (71)
Canonical thresholding for nonsparse high-dimensional linear regression ⋮ Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks ⋮ Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks ⋮ Deep learning: a statistical viewpoint ⋮ Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation ⋮ Neural network approximation ⋮ Deep learning for inverse problems. Abstracts from the workshop held March 7--13, 2021 (hybrid meeting) ⋮ Surprises in high-dimensional ridgeless least squares interpolation ⋮ Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration ⋮ Learning curves of generic features maps for realistic datasets with a teacher-student model* ⋮ Generalization error rates in kernel regression: the crossover from the noiseless to noisy regime* ⋮ On the proliferation of support vectors in high dimensions* ⋮ A precise high-dimensional asymptotic theory for boosting and minimum-\(\ell_1\)-norm interpolated classifiers ⋮ Dimensionality Reduction, Regularization, and Generalization in Overparameterized Regressions ⋮ Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting, and Regularization ⋮ The unreasonable effectiveness of deep learning in artificial intelligence ⋮ Overparameterization and Generalization Error: Weighted Trigonometric Interpolation ⋮ Weighted random sampling and reconstruction in general multivariate trigonometric polynomial spaces ⋮ Benefit of Interpolation in Nearest Neighbor Algorithms ⋮ HARFE: hard-ridge random feature expansion ⋮ A note on the prediction error of principal component regression in high dimensions ⋮ High dimensional binary classification under label shift: phase transition and regularization ⋮ On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions ⋮ Free dynamics of feature learning processes ⋮ A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors ⋮ Towards data augmentation in graph neural network: an overview and evaluation ⋮ PAC-learning with approximate predictors ⋮ Unnamed Item ⋮ Random neural networks in the infinite width limit as Gaussian processes ⋮ A domain-theoretic framework for robustness analysis of neural networks ⋮ High-Dimensional Analysis of Double Descent for Linear Regression with Random Projections ⋮ Measuring Complexity of Learning Schemes Using Hessian-Schatten Total Variation ⋮ A geometric view on the role of nonlinear feature maps in few-shot learning ⋮ A Generalization Gap Estimation for Overparameterized Models via the Langevin Functional Variance ⋮ Benign Overfitting and Noisy Features ⋮ Learning ability of interpolating deep convolutional neural networks ⋮ A Review of Process Optimization for Additive Manufacturing Based on Machine Learning ⋮ Dimension-free bounds for sums of dependent matrices and operators with heavy-tailed distributions ⋮ The leave-worst-\(k\)-out criterion for cross validation ⋮ Benign overfitting and adaptive nonparametric regression ⋮ Quantitative limit theorems and bootstrap approximations for empirical spectral projectors ⋮ Over-parametrized deep neural networks minimizing the empirical risk do not generalize well ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Distributed SGD in overparametrized linear regression ⋮ A moment-matching approach to testable learning and a new characterization of Rademacher complexity ⋮ Same root different leaves: time series and cross-sectional methods in panel data ⋮ Estimation of Linear Functionals in High-Dimensional Linear Models: From Sparsity to Nonsparsity ⋮ The common intuition to transfer learning can win or lose: case studies for linear regression ⋮ Convergence analysis for over-parameterized deep learning ⋮ Fluctuations, bias, variance and ensemble of learners: exact asymptotics for convex losses in high-dimension ⋮ Redundant representations help generalization in wide neural networks ⋮ New equivalences between interpolation and SVMs: kernels and structured features ⋮ Double data piling: a high-dimensional solution for asymptotically perfect multi-category classification ⋮ Deep networks for system identification: a survey ⋮ High-dimensional dynamics of generalization error in neural networks ⋮ Double data piling leads to perfect classification ⋮ An elementary analysis of ridge regression with random design ⋮ Generalization Error of Minimum Weighted Norm and Kernel Interpolation ⋮ Dimension independent excess risk by stochastic gradient descent ⋮ Implicit Regularization and Momentum Algorithms in Nonlinearly Parameterized Adaptive Control and Prediction ⋮ On the robustness of minimum norm interpolators and regularized empirical risk minimizers ⋮ Unnamed Item ⋮ Unnamed Item ⋮ AdaBoost and robust one-bit compressed sensing ⋮ A Unifying Tutorial on Approximate Message Passing ⋮ The interpolation phase transition in neural networks: memorization and generalization under lazy training ⋮ A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent* ⋮ For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability ⋮ Two Models of Double Descent for Weak Features
Cites Work
- The Hilbert kernel regression estimate.
- Surprises in high-dimensional ridgeless least squares interpolation
- Gradient descent optimizes over-parameterized deep ReLU networks
- Just interpolate: kernel ``ridgeless regression can generalize
- Efficient agnostic learning of neural networks with bounded fan-in
- The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network
- 10.1162/153244303321897690
- Size-independent sample complexity of neural networks
- Two Models of Double Descent for Weak Features
- Reconciling modern machine-learning practice and the classical bias–variance trade-off
- Breaking the Curse of Dimensionality with Convex Neural Networks
- A Note on Pseudoinverses
- The elements of statistical learning. Data mining, inference, and prediction
This page was built for publication: Benign overfitting in linear regression