Gradient descent on infinitely wide neural networks: global convergence and generalization
From MaRDI portal
Publication:6200217
Abstract: Many supervised machine learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many mathematical guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this review paper, we consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived.
Recommendations
- Wide neural networks of any depth evolve as linear models under gradient descent *
- Infinite-width limit of deep linear neural networks
- A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
- A mean field view of the landscape of two-layer neural networks
- Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime
Cites Work
- scientific article; zbMATH DE number 1972910 (Why is no real title available?)
- scientific article; zbMATH DE number 3276164 (Why is no real title available?)
- A mean field view of the landscape of two-layer neural networks
- An Introduction to Numerical Analysis
- Asymptotic Statistics
- Bounds on rates of variable-basis and neural-network approximation
- Breaking the curse of dimensionality with convex neural networks
- Convex optimization: algorithms and complexity
- Deep learning
- Empirical margin distributions and bounding the generalization error of combined classifiers
- Foundations of machine learning
- Gradient flows in metric spaces and in the space of probability measures
- Lectures on convex optimization
- Mean field analysis of neural networks: a law of large numbers
- Optimal transport for applied mathematicians. Calculus of variations, PDEs, and modeling
- Probabilistic representation and uniqueness results for measure-valued solutions of transport equations
- The implicit bias of gradient descent on separable data
- Universal approximation bounds for superpositions of a sigmoidal function
Cited In (9)
- Infinite-width limit of deep linear neural networks
- A geometric approach of gradient descent algorithms in linear neural networks
- Concentration of measure and global optimization of Bayesian multilayer perceptron. I
- Phase diagram of stochastic gradient descent in high-dimensional two-layer neural networks
- Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime
- Convergence of deep convolutional neural networks
- Numerical solution of Poisson partial differential equation in high dimension using two-layer neural networks
- Approximation results for gradient flow trained shallow neural networks in \(1d\)
- Universality of gradient descent neural network training
This page was built for publication: Gradient descent on infinitely wide neural networks: global convergence and generalization
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6200217)