A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
From MaRDI portal
Publication:2197845
DOI10.1007/s11425-019-1628-5zbMath1453.68163arXiv1904.04326OpenAlexW2938647293MaRDI QIDQ2197845
Publication date: 1 September 2020
Published in: Science China. Mathematics (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1904.04326
Artificial neural networks and deep learning (68T07) Applications of mathematical programming (90C90) Hilbert spaces with reproducing kernels (= (proper) functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces) (46E22)
Related Items
Machine learning from a continuous viewpoint. I ⋮ On the Exact Computation of Linear Frequency Principle Dynamics and Its Generalization ⋮ A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions ⋮ A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions ⋮ Full error analysis for the training of deep neural networks ⋮ Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation ⋮ Unnamed Item ⋮ SPADE4: sparsity and delay embedding based forecasting of epidemics ⋮ Kolmogorov width decay and poor approximators in machine learning: shallow neural networks, random feature models and neural tangent kernels ⋮ Searching the solution landscape by generalized high-index saddle dynamics ⋮ Unnamed Item ⋮ Non-convergence of stochastic gradient descent in the training of deep neural networks ⋮ Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation ⋮ The interpolation phase transition in neural networks: memorization and generalization under lazy training
Uses Software
Cites Work
- Unnamed Item
- Gradient descent optimizes over-parameterized deep ReLU networks
- A priori estimates of the population risk for two-layer neural networks
- Mean field analysis of neural networks: a central limit theorem
- Universal approximation bounds for superpositions of a sigmoidal function
- Hinging hyperplanes for regression, classification, and function approximation
- A mean field view of the landscape of two-layer neural networks
- Understanding Machine Learning
- Theory of Reproducing Kernels