A mean field view of the landscape of two-layer neural networks
From MaRDI portal
Publication:4967449
DOI10.1073/pnas.1806579115zbMath1416.92014arXiv1804.06561OpenAlexW2963095610WikidataQ56610168 ScholiaQ56610168MaRDI QIDQ4967449
No author found.
Publication date: 3 July 2019
Published in: Proceedings of the National Academy of Sciences (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1804.06561
Neural networks for/in biological studies, artificial life and related topics (92B20) Neural nets and related approaches to inference from stochastic processes (62M45)
Related Items
High‐dimensional limit theorems for SGD: Effective dynamics and critical scaling ⋮ Benign Overfitting and Noisy Features ⋮ Stochastic gradient descent with noise of machine learning type. II: Continuous time analysis ⋮ Normalization effects on deep neural networks ⋮ Gradient descent on infinitely wide neural networks: global convergence and generalization ⋮ Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks ⋮ Analysis of the Generalization Error: Empirical Risk Minimization over Deep Artificial Neural Networks Overcomes the Curse of Dimensionality in the Numerical Approximation of Black--Scholes Partial Differential Equations ⋮ Stationary Density Estimation of Itô Diffusions Using Deep Learning ⋮ The Continuous Formulation of Shallow Neural Networks as Wasserstein-Type Gradient Flows ⋮ Deep learning: a statistical viewpoint ⋮ Machine learning from a continuous viewpoint. I ⋮ The effective noise of stochastic gradient descent ⋮ Surprises in high-dimensional ridgeless least squares interpolation ⋮ Loss landscapes and optimization in over-parameterized non-linear systems and neural networks ⋮ Align, then memorise: the dynamics of learning with feedback alignment* ⋮ Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity ⋮ Align, then memorise: the dynamics of learning with feedback alignment* ⋮ Two-Layer Neural Networks with Values in a Banach Space ⋮ Particle dual averaging: optimization of mean field neural network with global convergence rate analysis* ⋮ Adaptive and Implicit Regularization for Matrix Completion ⋮ Mean-field inference methods for neural networks ⋮ Mean-field and kinetic descriptions of neural differential equations ⋮ Sparse optimization on measures with over-parameterized gradient descent ⋮ A Riemannian mean field formulation for two-layer neural networks with batch normalization ⋮ Archetypal landscapes for deep neural networks ⋮ The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima ⋮ Hessian informed mirror descent ⋮ Mean Field Analysis of Deep Neural Networks ⋮ Neural collapse with unconstrained features ⋮ A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization ⋮ Asymptotics of Reinforcement Learning with Neural Networks ⋮ Unbiased Deep Solvers for Linear Parametric PDEs ⋮ The Discovery of Dynamics via Linear Multistep Methods and Deep Learning: Error Estimation ⋮ On the Benefit of Width for Neural Networks: Disappearance of Basins ⋮ Large Sample Mean-Field Stochastic Optimization ⋮ Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits ⋮ Simultaneous neural network approximation for smooth functions ⋮ Neural network approximation: three hidden layers are enough ⋮ Extremely randomized neural networks for constructing prediction intervals ⋮ A rigorous framework for the mean field limit of multilayer neural networks ⋮ Large-Scale Nonconvex Optimization: Randomization, Gap Estimation, and Numerical Resolution ⋮ The emergence of a concept in shallow neural networks ⋮ A class of dimension-free metrics for the convergence of empirical measures ⋮ Gaussian fluctuations for interacting particle systems with singular kernels ⋮ Sharp uniform-in-time propagation of chaos ⋮ Continuous limits of residual neural networks in case of large input data ⋮ Global-in-time mean-field convergence for singular Riesz-type diffusive flows ⋮ Convergence rates of gradient methods for convex optimization in the space of measures ⋮ Online parameter estimation for the McKean-Vlasov stochastic differential equation ⋮ A mathematical perspective of machine learning ⋮ Unnamed Item ⋮ McKean-Vlasov equations involving hitting times: blow-ups and global solvability ⋮ Uniform-in-time propagation of chaos for kinetic mean field Langevin dynamics ⋮ A blob method for inhomogeneous diffusion with applications to multi-agent control and sampling ⋮ Unbiased Estimation Using Underdamped Langevin Dynamics ⋮ Optimal deep neural networks by maximization of the approximation power ⋮ On the global convergence of particle swarm optimization methods ⋮ Hierarchies, entropy, and quantitative propagation of chaos for mean field diffusions ⋮ A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics ⋮ Polyak-Łojasiewicz inequality on the space of measures and convergence of mean-field birth-death processes ⋮ Mirror descent algorithms for minimizing interacting free energy ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Mean field limit for Coulomb-type flows ⋮ Optimization for deep learning: an overview ⋮ Landscape and training regimes in deep learning ⋮ Fast Non-mean-field Networks: Uniform in Time Averaging ⋮ Unnamed Item ⋮ Plateau Phenomenon in Gradient Descent Training of RELU Networks: Explanation, Quantification, and Avoidance ⋮ Unnamed Item ⋮ Data-driven vector soliton solutions of coupled nonlinear Schrödinger equation using a deep learning algorithm ⋮ Machine Learning and Computational Mathematics ⋮ Analysis of a two-layer neural network via displacement convexity ⋮ Topological properties of the set of functions generated by neural networks of fixed size ⋮ A selective overview of deep learning ⋮ Linearized two-layers neural networks in high dimension ⋮ On Functions Computed on Trees ⋮ Geometric compression of invariant manifolds in neural networks ⋮ Mean field analysis of neural networks: a central limit theorem ⋮ Mean Field Analysis of Neural Networks: A Law of Large Numbers ⋮ High-dimensional dynamics of generalization error in neural networks ⋮ Maximum likelihood estimation of potential energy in interacting particle systems from single-trajectory data ⋮ Reinforcement learning and stochastic optimisation ⋮ Fitting small piece-wise linear neural network models to interpolate data sets ⋮ Normalization effects on shallow neural networks and related asymptotic expansions ⋮ Mean-field Langevin dynamics and energy landscape of neural networks ⋮ Supervised learning from noisy observations: combining machine-learning techniques with data assimilation ⋮ Propagation of chaos: a review of models, methods and applications. I: Models and methods ⋮ Propagation of chaos: a review of models, methods and applications. II: Applications ⋮ Measurement error models: from nonparametric methods to deep neural networks ⋮ Wide neural networks of any depth evolve as linear models under gradient descent * ⋮ Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup* ⋮ Unnamed Item ⋮ Stabilize deep ResNet with a sharp scaling factor \(\tau\) ⋮ Asymptotic properties of one-layer artificial neural networks with sparse connectivity ⋮ The mean-field approximation for higher-dimensional Coulomb flows in the scaling-critical L ∞ space ⋮ Matrix inference and estimation in multi-layer models* ⋮ An analytic theory of shallow networks dynamics for hinge loss classification* ⋮ Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification* ⋮ When do neural networks outperform kernel methods?* ⋮ A trajectorial approach to relative entropy dissipation of McKean-Vlasov diffusions: gradient flows and HWBI inequalities ⋮ Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations ⋮ Do ideas have shape? Idea registration as the continuous limit of artificial neural networks ⋮ Representation formulas and pointwise properties for Barron functions