Two Models of Double Descent for Weak Features
From MaRDI portal
Publication:5027013
DOI10.1137/20M1336072zbMath1484.62090arXiv1903.07571OpenAlexW3111350549MaRDI QIDQ5027013
Ji Xu, Daniel Hsu, Mikhail Belkin
Publication date: 3 February 2022
Published in: SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1903.07571
Related Items
Canonical thresholding for nonsparse high-dimensional linear regression, Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks, Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks, Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation, Surprises in high-dimensional ridgeless least squares interpolation, Learning curves of generic features maps for realistic datasets with a teacher-student model*, Generalization error rates in kernel regression: the crossover from the noiseless to noisy regime*, A precise high-dimensional asymptotic theory for boosting and minimum-\(\ell_1\)-norm interpolated classifiers, Dimensionality Reduction, Regularization, and Generalization in Overparameterized Regressions, Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting, and Regularization, Theoretical issues in deep networks, Benign overfitting in linear regression, Overparameterization and Generalization Error: Weighted Trigonometric Interpolation, Benefit of Interpolation in Nearest Neighbor Algorithms, High dimensional binary classification under label shift: phase transition and regularization, Large-dimensional random matrix theory and its applications in deep learning and wireless communications, On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions, A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors, Unnamed Item, Overparameterized maximum likelihood tests for detection of sparse vectors, High-Dimensional Analysis of Double Descent for Linear Regression with Random Projections, A Generalization Gap Estimation for Overparameterized Models via the Langevin Functional Variance, Benign Overfitting and Noisy Features, Unnamed Item, Unnamed Item, Dimension independent excess risk by stochastic gradient descent, Unnamed Item, A Unifying Tutorial on Approximate Message Passing, The interpolation phase transition in neural networks: memorization and generalization under lazy training
Cites Work
- Limiting empirical singular value distribution of restrictions of discrete Fourier transform matrices
- Concentration inequalities for sampling without replacement
- Smallest singular value of a random rectangular matrix
- How Many Variables Should be Entered in a Regression Equation?
- Generalization in a linear perceptron in the presence of noise
- Dynamics of batch training in a perceptron
- High-Dimensional Probability
- An elementary proof of a theorem of Johnson and Lindenstrauss
- Benign overfitting in linear regression
- Reconciling modern machine-learning practice and the classical bias–variance trade-off
- A jamming transition from under- to over-parametrization affects generalization in deep learning