A mean field view of the landscape of two-layer neural networks

From MaRDI portal
Publication:4967449

DOI10.1073/pnas.1806579115zbMath1416.92014arXiv1804.06561OpenAlexW2963095610WikidataQ56610168 ScholiaQ56610168MaRDI QIDQ4967449

No author found.

Publication date: 3 July 2019

Published in: Proceedings of the National Academy of Sciences (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1804.06561




Related Items

High‐dimensional limit theorems for SGD: Effective dynamics and critical scalingBenign Overfitting and Noisy FeaturesStochastic gradient descent with noise of machine learning type. II: Continuous time analysisNormalization effects on deep neural networksGradient descent on infinitely wide neural networks: global convergence and generalizationMehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural NetworksAnalysis of the Generalization Error: Empirical Risk Minimization over Deep Artificial Neural Networks Overcomes the Curse of Dimensionality in the Numerical Approximation of Black--Scholes Partial Differential EquationsStationary Density Estimation of Itô Diffusions Using Deep LearningThe Continuous Formulation of Shallow Neural Networks as Wasserstein-Type Gradient FlowsDeep learning: a statistical viewpointMachine learning from a continuous viewpoint. IThe effective noise of stochastic gradient descentSurprises in high-dimensional ridgeless least squares interpolationLoss landscapes and optimization in over-parameterized non-linear systems and neural networksAlign, then memorise: the dynamics of learning with feedback alignment*Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to InfinityAlign, then memorise: the dynamics of learning with feedback alignment*Two-Layer Neural Networks with Values in a Banach SpaceParticle dual averaging: optimization of mean field neural network with global convergence rate analysis*Adaptive and Implicit Regularization for Matrix CompletionMean-field inference methods for neural networksMean-field and kinetic descriptions of neural differential equationsSparse optimization on measures with over-parameterized gradient descentA Riemannian mean field formulation for two-layer neural networks with batch normalizationArchetypal landscapes for deep neural networksThe inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minimaHessian informed mirror descentMean Field Analysis of Deep Neural NetworksNeural collapse with unconstrained featuresA Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex OptimizationAsymptotics of Reinforcement Learning with Neural NetworksUnbiased Deep Solvers for Linear Parametric PDEsThe Discovery of Dynamics via Linear Multistep Methods and Deep Learning: Error EstimationOn the Benefit of Width for Neural Networks: Disappearance of BasinsLarge Sample Mean-Field Stochastic OptimizationTraining Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation BenefitsSimultaneous neural network approximation for smooth functionsNeural network approximation: three hidden layers are enoughExtremely randomized neural networks for constructing prediction intervalsA rigorous framework for the mean field limit of multilayer neural networksLarge-Scale Nonconvex Optimization: Randomization, Gap Estimation, and Numerical ResolutionThe emergence of a concept in shallow neural networksA class of dimension-free metrics for the convergence of empirical measuresGaussian fluctuations for interacting particle systems with singular kernelsSharp uniform-in-time propagation of chaosContinuous limits of residual neural networks in case of large input dataGlobal-in-time mean-field convergence for singular Riesz-type diffusive flowsConvergence rates of gradient methods for convex optimization in the space of measuresOnline parameter estimation for the McKean-Vlasov stochastic differential equationA mathematical perspective of machine learningUnnamed ItemMcKean-Vlasov equations involving hitting times: blow-ups and global solvabilityUniform-in-time propagation of chaos for kinetic mean field Langevin dynamicsA blob method for inhomogeneous diffusion with applications to multi-agent control and samplingUnbiased Estimation Using Underdamped Langevin DynamicsOptimal deep neural networks by maximization of the approximation powerOn the global convergence of particle swarm optimization methodsHierarchies, entropy, and quantitative propagation of chaos for mean field diffusionsA comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamicsPolyak-Łojasiewicz inequality on the space of measures and convergence of mean-field birth-death processesMirror descent algorithms for minimizing interacting free energyUnnamed ItemUnnamed ItemMean field limit for Coulomb-type flowsOptimization for deep learning: an overviewLandscape and training regimes in deep learningFast Non-mean-field Networks: Uniform in Time AveragingUnnamed ItemPlateau Phenomenon in Gradient Descent Training of RELU Networks: Explanation, Quantification, and AvoidanceUnnamed ItemData-driven vector soliton solutions of coupled nonlinear Schrödinger equation using a deep learning algorithmMachine Learning and Computational MathematicsAnalysis of a two-layer neural network via displacement convexityTopological properties of the set of functions generated by neural networks of fixed sizeA selective overview of deep learningLinearized two-layers neural networks in high dimensionOn Functions Computed on TreesGeometric compression of invariant manifolds in neural networksMean field analysis of neural networks: a central limit theoremMean Field Analysis of Neural Networks: A Law of Large NumbersHigh-dimensional dynamics of generalization error in neural networksMaximum likelihood estimation of potential energy in interacting particle systems from single-trajectory dataReinforcement learning and stochastic optimisationFitting small piece-wise linear neural network models to interpolate data setsNormalization effects on shallow neural networks and related asymptotic expansionsMean-field Langevin dynamics and energy landscape of neural networksSupervised learning from noisy observations: combining machine-learning techniques with data assimilationPropagation of chaos: a review of models, methods and applications. I: Models and methodsPropagation of chaos: a review of models, methods and applications. II: ApplicationsMeasurement error models: from nonparametric methods to deep neural networksWide neural networks of any depth evolve as linear models under gradient descent *Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup*Unnamed ItemStabilize deep ResNet with a sharp scaling factor \(\tau\)Asymptotic properties of one-layer artificial neural networks with sparse connectivityThe mean-field approximation for higher-dimensional Coulomb flows in the scaling-critical L spaceMatrix inference and estimation in multi-layer models*An analytic theory of shallow networks dynamics for hinge loss classification*Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification*When do neural networks outperform kernel methods?*A trajectorial approach to relative entropy dissipation of McKean-Vlasov diffusions: gradient flows and HWBI inequalitiesSuboptimal Local Minima Exist for Wide Neural Networks with Smooth ActivationsDo ideas have shape? Idea registration as the continuous limit of artificial neural networksRepresentation formulas and pointwise properties for Barron functions