A mean field view of the landscape of two-layer neural networks
From MaRDI portal
Publication:4967449
Abstract: Multi-layer neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy mathematical understanding. Learning a neural network requires to optimize a non-convex high-dimensional objective (risk function), a problem which is usually attacked using stochastic gradient descent (SGD). Does SGD converge to a global optimum of the risk or only to a local optimum? In the first case, does this happen because local minima are absent, or because SGD somehow avoids them? In the second, why do local minima reached by SGD have good generalization properties? In this paper we consider a simple case, namely two-layers neural networks, and prove that -in a suitable scaling limit- SGD dynamics is captured by a certain non-linear partial differential equation (PDE) that we call distributional dynamics (DD). We then consider several specific examples, and show how DD can be used to prove convergence of SGD to networks with nearly ideal generalization error. This description allows to 'average-out' some of the complexities of the landscape of neural networks, and can be used to prove a general convergence result for noisy SGD.
Recommendations
- Mean Field Analysis of Deep Neural Networks
- Mean-field Langevin dynamics and energy landscape of neural networks
- Analysis of a two-layer neural network via displacement convexity
- A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
- Optimization Landscape of Neural Networks
Cited in
(only showing first 100 items - show all)- Data-driven vector soliton solutions of coupled nonlinear Schrödinger equation using a deep learning algorithm
- Archetypal landscapes for deep neural networks
- Efficient and stable SAV-based methods for gradient flows arising from deep learning
- Solving PDEs on unknown manifolds with machine learning
- Deep learning: a statistical viewpoint
- Large-Scale Nonconvex Optimization: Randomization, Gap Estimation, and Numerical Resolution
- Fitting small piece-wise linear neural network models to interpolate data sets
- Normalization effects on shallow neural networks and related asymptotic expansions
- Align, then memorise: the dynamics of learning with feedback alignment*
- Particle dual averaging: optimization of mean field neural network with global convergence rate analysis*
- Supervised learning from noisy observations: combining machine-learning techniques with data assimilation
- scientific article; zbMATH DE number 7625201 (Why is no real title available?)
- Convergence results for neural networks via electrodynamics
- Simultaneous neural network approximation for smooth functions
- Propagation of chaos: a review of models, methods and applications. I: Models and methods
- Probabilistic Lambert problem: connections with optimal mass transport, Schrödinger bridge, and reaction-diffusion PDEs
- State space emulation and annealed sequential Monte Carlo for high dimensional optimization
- The emergence of a concept in shallow neural networks
- Sparse optimization on measures with over-parameterized gradient descent
- A blob method for inhomogeneous diffusion with applications to multi-agent control and sampling
- Propagation of chaos: a review of models, methods and applications. II: Applications
- Exact learning dynamics of deep linear networks with prior knowledge
- Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification*
- Surprises in high-dimensional ridgeless least squares interpolation
- Optimization for deep learning: an overview
- A mathematical perspective of machine learning
- Linearized two-layers neural networks in high dimension
- The curse of overparametrization in adversarial training: precise analysis of robust generalization for random features regression
- Unbiased deep solvers for linear parametric PDEs
- Global contractivity for Langevin dynamics with distribution-dependent forces and uniform in time propagation of chaos
- McKean-Vlasov equations involving hitting times: blow-ups and global solvability
- Uniform-in-time propagation of chaos for kinetic mean field Langevin dynamics
- A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization
- Asymptotics of Reinforcement Learning with Neural Networks
- Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations
- Continuous limits of residual neural networks in case of large input data
- scientific article; zbMATH DE number 7370588 (Why is no real title available?)
- Optimal deep neural networks by maximization of the approximation power
- Stabilize deep ResNet with a sharp scaling factor \(\tau\)
- Asymptotic properties of one-layer artificial neural networks with sparse connectivity
- Machine learning from a continuous viewpoint. I
- Matrix inference and estimation in multi-layer models*
- scientific article; zbMATH DE number 7626752 (Why is no real title available?)
- Large Sample Mean-Field Stochastic Optimization
- Learning particle swarming models from data with Gaussian processes
- Two-Layer Neural Networks with Values in a Banach Space
- Do ideas have shape? Idea registration as the continuous limit of artificial neural networks
- Representation formulas and pointwise properties for Barron functions
- Mean Field Analysis of Deep Neural Networks
- Analysis of a two-layer neural network via displacement convexity
- Unbiased Estimation Using Underdamped Langevin Dynamics
- Machine learning and computational mathematics
- Align, then memorise: the dynamics of learning with feedback alignment
- The discovery of dynamics via linear multistep methods and deep learning: error estimation
- Optimization in machine learning: a distribution-space approach
- Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits
- Mean-field Langevin dynamics and energy landscape of neural networks
- A trajectorial approach to relative entropy dissipation of McKean-Vlasov diffusions: gradient flows and HWBI inequalities
- Sharp uniform-in-time propagation of chaos
- Ergodicity of the underdamped mean-field Langevin dynamics
- Wide neural networks of any depth evolve as linear models under gradient descent *
- High-dimensional dynamics of generalization error in neural networks
- Infinite-width limit of deep linear neural networks
- Reinforcement learning and stochastic optimisation
- Mean field limit for Coulomb-type flows
- Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations
- Non-convergence of stochastic gradient descent in the training of deep neural networks
- A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks
- scientific article; zbMATH DE number 7387621 (Why is no real title available?)
- A rigorous framework for the mean field limit of multilayer neural networks
- Mean-field inference methods for neural networks
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
- scientific article; zbMATH DE number 7415108 (Why is no real title available?)
- Online parameter estimation for the McKean-Vlasov stochastic differential equation
- Polyak-Łojasiewicz inequality on the space of measures and convergence of mean-field birth-death processes
- A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
- Adaptive and Implicit Regularization for Matrix Completion
- Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks
- Consensus-based optimization methods converge globally
- An analytic theory of shallow networks dynamics for hinge loss classification*
- Topological properties of the set of functions generated by neural networks of fixed size
- A priori estimates of the population risk for two-layer neural networks
- Extremely randomized neural networks for constructing prediction intervals
- Mean field analysis of neural networks: a central limit theorem
- Stationary Density Estimation of Itô Diffusions Using Deep Learning
- The Continuous Formulation of Shallow Neural Networks as Wasserstein-Type Gradient Flows
- Exact learning dynamics of deep linear networks with prior knowledge
- A Riemannian mean field formulation for two-layer neural networks with batch normalization
- On the global convergence of particle swarm optimization methods
- Mirror descent algorithms for minimizing interacting free energy
- The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima
- Mean field analysis of neural networks: a law of large numbers
- Learning sparse features can lead to overfitting in neural networks
- Phase diagram of stochastic gradient descent in high-dimensional two-layer neural networks
- Redundant representations help generalization in wide neural networks
- Self-consistent dynamical field theory of kernel evolution in wide neural networks
- Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime
- On functions computed on trees
- A selective overview of deep learning
- scientific article; zbMATH DE number 7387622 (Why is no real title available?)
This page was built for publication: A mean field view of the landscape of two-layer neural networks
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4967449)