A mean field view of the landscape of two-layer neural networks
DOI10.1073/PNAS.1806579115zbMATH Open1416.92014arXiv1804.06561OpenAlexW2963095610WikidataQ56610168 ScholiaQ56610168MaRDI QIDQ4967449FDOQ4967449
Authors:
Publication date: 3 July 2019
Published in: Proceedings of the National Academy of Sciences (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1804.06561
Recommendations
- Mean Field Analysis of Deep Neural Networks
- Mean-field Langevin dynamics and energy landscape of neural networks
- Analysis of a two-layer neural network via displacement convexity
- A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
- Optimization Landscape of Neural Networks
Neural nets and related approaches to inference from stochastic processes (62M45) Neural networks for/in biological studies, artificial life and related topics (92B20)
Cited In (only showing first 100 items - show all)
- Continuous limits of residual neural networks in case of large input data
- Optimal deep neural networks by maximization of the approximation power
- Stabilize deep ResNet with a sharp scaling factor \(\tau\)
- Asymptotic properties of one-layer artificial neural networks with sparse connectivity
- Title not available (Why is that?)
- Machine learning from a continuous viewpoint. I
- Two-Layer Neural Networks with Values in a Banach Space
- Mean Field Analysis of Deep Neural Networks
- Do ideas have shape? Idea registration as the continuous limit of artificial neural networks
- Representation formulas and pointwise properties for Barron functions
- Machine learning and computational mathematics
- The discovery of dynamics via linear multistep methods and deep learning: error estimation
- Analysis of a two-layer neural network via displacement convexity
- Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits
- Mean-field Langevin dynamics and energy landscape of neural networks
- A trajectorial approach to relative entropy dissipation of McKean-Vlasov diffusions: gradient flows and HWBI inequalities
- Wide neural networks of any depth evolve as linear models under gradient descent *
- Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations
- High-dimensional dynamics of generalization error in neural networks
- Reinforcement learning and stochastic optimisation
- Mean field limit for Coulomb-type flows
- Title not available (Why is that?)
- A rigorous framework for the mean field limit of multilayer neural networks
- Non-convergence of stochastic gradient descent in the training of deep neural networks
- Mean-field inference methods for neural networks
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
- Adaptive and Implicit Regularization for Matrix Completion
- A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
- Extremely randomized neural networks for constructing prediction intervals
- A priori estimates of the population risk for two-layer neural networks
- Topological properties of the set of functions generated by neural networks of fixed size
- Mean field analysis of neural networks: a central limit theorem
- On the global convergence of particle swarm optimization methods
- Mean field analysis of neural networks: a law of large numbers
- On functions computed on trees
- A Riemannian mean field formulation for two-layer neural networks with batch normalization
- Mirror descent algorithms for minimizing interacting free energy
- A selective overview of deep learning
- Neural network approximation: three hidden layers are enough
- Fast Non-mean-field Networks: Uniform in Time Averaging
- Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup*
- Hessian informed mirror descent
- When do neural networks outperform kernel methods?*
- Neural collapse with unconstrained features
- Measurement error models: from nonparametric methods to deep neural networks
- Mean-field and kinetic descriptions of neural differential equations
- Convergence rates of gradient methods for convex optimization in the space of measures
- Landscape and training regimes in deep learning
- Maximum likelihood estimation of potential energy in interacting particle systems from single-trajectory data
- Deep learning: a statistical viewpoint
- Data-driven vector soliton solutions of coupled nonlinear Schrödinger equation using a deep learning algorithm
- Large-Scale Nonconvex Optimization: Randomization, Gap Estimation, and Numerical Resolution
- Fitting small piece-wise linear neural network models to interpolate data sets
- Normalization effects on shallow neural networks and related asymptotic expansions
- Supervised learning from noisy observations: combining machine-learning techniques with data assimilation
- Convergence results for neural networks via electrodynamics
- The emergence of a concept in shallow neural networks
- Propagation of chaos: a review of models, methods and applications. I: Models and methods
- Sparse optimization on measures with over-parameterized gradient descent
- A blob method for inhomogeneous diffusion with applications to multi-agent control and sampling
- Propagation of chaos: a review of models, methods and applications. II: Applications
- Surprises in high-dimensional ridgeless least squares interpolation
- Optimization for deep learning: an overview
- Linearized two-layers neural networks in high dimension
- Unbiased deep solvers for linear parametric PDEs
- A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization
- Asymptotics of Reinforcement Learning with Neural Networks
- Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations
- Title not available (Why is that?)
- Matrix inference and estimation in multi-layer models*
- Learning particle swarming models from data with Gaussian processes
- Large Sample Mean-Field Stochastic Optimization
- Unbiased Estimation Using Underdamped Langevin Dynamics
- Align, then memorise: the dynamics of learning with feedback alignment
- Optimization in machine learning: a distribution-space approach
- Sharp uniform-in-time propagation of chaos
- Ergodicity of the underdamped mean-field Langevin dynamics
- Infinite-width limit of deep linear neural networks
- A unified Fourier slice method to derive ridgelet transform for a variety of depth-2 neural networks
- Title not available (Why is that?)
- Online parameter estimation for the McKean-Vlasov stochastic differential equation
- Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks
- Consensus-based optimization methods converge globally
- Polyak-Łojasiewicz inequality on the space of measures and convergence of mean-field birth-death processes
- An analytic theory of shallow networks dynamics for hinge loss classification*
- Exact learning dynamics of deep linear networks with prior knowledge
- Stationary Density Estimation of Itô Diffusions Using Deep Learning
- The Continuous Formulation of Shallow Neural Networks as Wasserstein-Type Gradient Flows
- Learning sparse features can lead to overfitting in neural networks
- Phase diagram of stochastic gradient descent in high-dimensional two-layer neural networks
- Redundant representations help generalization in wide neural networks
- Self-consistent dynamical field theory of kernel evolution in wide neural networks
- Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime
- The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima
- Title not available (Why is that?)
- High‐dimensional limit theorems for SGD: Effective dynamics and critical scaling
- Benign Overfitting and Noisy Features
- Plateau phenomenon in gradient descent training of RELU networks: explanation, quantification, and avoidance
- On the benefit of width for neural networks: disappearance of basins
- Stochastic gradient descent with noise of machine learning type. II: Continuous time analysis
This page was built for publication: A mean field view of the landscape of two-layer neural networks
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4967449)