Mean field analysis of neural networks: a law of large numbers
From MaRDI portal
Publication:5219306
Abstract: Machine learning, and in particular neural network models, have revolutionized fields such as image, text, and speech recognition. Today, many important real-world applications in these areas are driven by neural networks. There are also growing applications in engineering, robotics, medicine, and finance. Despite their immense success in practice, there is limited mathematical understanding of neural networks. This paper illustrates how neural networks can be studied via stochastic analysis, and develops approaches for addressing some of the technical challenges which arise. We analyze one-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously prove that the empirical distribution of the neural network parameters converges to the solution of a nonlinear partial differential equation. This result can be considered a law of large numbers for neural networks. In addition, a consequence of our analysis is that the trained parameters of the neural network asymptotically become independent, a property which is commonly called "propagation of chaos".
Recommendations
- Mean field analysis of neural networks: a central limit theorem
- Mean Field Analysis of Deep Neural Networks
- Asymptotic properties of one-layer artificial neural networks with sparse connectivity
- Statistical guarantees for regularized neural networks
- A mean field view of the landscape of two-layer neural networks
Cites work
- scientific article; zbMATH DE number 4211245 (Why is no real title available?)
- scientific article; zbMATH DE number 3951715 (Why is no real title available?)
- A mean field view of the landscape of two-layer neural networks
- A stochastic McKean-Vlasov equation for absorbing diffusions on the half-line
- Approximation and estimation bounds for artificial neural networks
- DGM: a deep learning algorithm for solving partial differential equations
- Deep learning
- Default clustering in large portfolios: typical events
- Gradient flows in metric spaces and in the space of probability measures
- Heterogeneous credit portfolios and the dynamics of the aggregate losses
- Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates
- Large deviations and mean-field theory for asymmetric random recurrent neural networks
- Large portfolio asymptotics for loss from default
- Large portfolio losses: A dynamic contagion model
- Machine learning strategies for systems with invariance properties
- McKean-Vlasov limit for interacting random processes in random media.
- Mean field analysis of neural networks: a central limit theorem
- Mean-field Langevin dynamics and energy landscape of neural networks
- Mean-field limit of a stochastic particle system smoothly interacting through threshold hitting-times and applications to neural networks with dendritic component
- Multilayer feedforward networks are universal approximators
- Nonlinear Markov processes and kinetic equations.
- Particle systems with a singular mean-field self-excitation. Application to neuronal networks
- Reynolds averaged turbulence modelling using deep neural networks with embedded invariance
- Separability and completeness for the Wasserstein distance
- Systemic risk in interbanking networks
- The Variational Formulation of the Fokker--Planck Equation
- Universal features of price formation in financial markets: perspectives from deep learning
Cited in
(52)- Asymptotics of Reinforcement Learning with Neural Networks
- Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations
- Continuous limits of residual neural networks in case of large input data
- Asymptotic properties of one-layer artificial neural networks with sparse connectivity
- Large Sample Mean-Field Stochastic Optimization
- Effects of depth, width, and initialization: a convergence analysis of layer-wise training for deep linear neural networks
- Two-Layer Neural Networks with Values in a Banach Space
- Representation formulas and pointwise properties for Barron functions
- Mean Field Analysis of Deep Neural Networks
- Optimization in machine learning: a distribution-space approach
- Mean-field Langevin dynamics and energy landscape of neural networks
- Sharp uniform-in-time propagation of chaos
- On the convergence of formally diverging neural net-based classifiers
- Infinite-width limit of deep linear neural networks
- Reinforcement learning and stochastic optimisation
- Non-convergence of stochastic gradient descent in the training of deep neural networks
- A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks
- A rigorous framework for the mean field limit of multilayer neural networks
- Mean-field inference methods for neural networks
- Mean field limits for interacting diffusions with colored noise: phase transitions and spectral numerical methods
- Online parameter estimation for the McKean-Vlasov stochastic differential equation
- Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks
- Consensus-based optimization methods converge globally
- Mean field analysis of neural networks: a central limit theorem
- The Continuous Formulation of Shallow Neural Networks as Wasserstein-Type Gradient Flows
- Learning sparse features can lead to overfitting in neural networks
- A selective overview of deep learning
- scientific article; zbMATH DE number 7387622 (Why is no real title available?)
- Stochastic gradient descent with noise of machine learning type. II: Continuous time analysis
- A class of dimension-free metrics for the convergence of empirical measures
- Statistical guarantees for regularized neural networks
- Fast Non-mean-field Networks: Uniform in Time Averaging
- Large deviations for nonlocal stochastic neural fields
- Non-mean-field Vicsek-type models for collective behavior
- Normalization effects on deep neural networks
- Gradient descent on infinitely wide neural networks: global convergence and generalization
- Markov chain network training and conservation law approximations: Linking microscopic and macroscopic models for evolution
- Error bounds of the invariant statistics in machine learning of ergodic Itô diffusions
- Landscape and training regimes in deep learning
- Stochastic differential equation approximations of generative adversarial network training and its long-run behavior
- Deep learning: a statistical viewpoint
- Normalization effects on shallow neural networks and related asymptotic expansions
- Supervised learning from noisy observations: combining machine-learning techniques with data assimilation
- Large deviation analysis of function sensitivity in random deep neural networks
- scientific article; zbMATH DE number 7625201 (Why is no real title available?)
- scientific article; zbMATH DE number 7643067 (Why is no real title available?)
- Propagation of chaos: a review of models, methods and applications. I: Models and methods
- Sparse optimization on measures with over-parameterized gradient descent
- A blob method for inhomogeneous diffusion with applications to multi-agent control and sampling
- Propagation of chaos: a review of models, methods and applications. II: Applications
- Nonlocal cross-diffusion systems for multi-species populations and networks
- Surprises in high-dimensional ridgeless least squares interpolation
This page was built for publication: Mean field analysis of neural networks: a law of large numbers
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5219306)