Scaling description of generalization with number of parameters in deep learning
From MaRDI portal
Publication:5856249
DOI10.1088/1742-5468/ab633czbMath1459.82250arXiv1901.01608OpenAlexW2907047316MaRDI QIDQ5856249
Matthieu Wyart, Levent Sagun, Franck Gabriel, Clément Hongler, A. Jacot, Stefano Spigler, Stéphane D'Ascoli, Mario Geiger, Giulio Biroli
Publication date: 25 March 2021
Published in: Journal of Statistical Mechanics: Theory and Experiment (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1901.01608
Learning and adaptive systems in artificial intelligence (68T05) Neural nets applied to problems in time-dependent statistical mechanics (82C32)
Related Items
Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks, Surprises in high-dimensional ridgeless least squares interpolation, The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima, Overparameterization and Generalization Error: Weighted Trigonometric Interpolation, Large-dimensional random matrix theory and its applications in deep learning and wireless communications, Free dynamics of feature learning processes, High-Dimensional Analysis of Double Descent for Linear Regression with Random Projections, Harmonic analysis of network systems via kernels and their boundary realizations, A Generalization Gap Estimation for Overparameterized Models via the Langevin Functional Variance, Normalization effects on deep neural networks, Landscape and training regimes in deep learning, Geometric compression of invariant manifolds in neural networks, Normalization effects on shallow neural networks and related asymptotic expansions, Unnamed Item, A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks, Triple descent and the two kinds of overfitting: where and why do they appear?*, An analytic theory of shallow networks dynamics for hinge loss classification*
Uses Software
Cites Work