A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization
From MaRDI portal
Publication:5084492
Abstract: Momentum Stochastic Gradient Descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning, e.g., training deep neural networks, variational Bayesian inference, and etc. Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points, but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.
Recommendations
- On the diffusion approximation of nonconvex stochastic gradient descent
- Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods
- Nonasymptotic estimates for stochastic gradient Langevin dynamics under local conditions in nonconvex optimization
- Convergence rates for the stochastic gradient descent method for non-convex objective functions
- Stochastic momentum methods for non-convex learning without bounded assumptions
- Convergence and dynamical behavior of the ADAM algorithm for nonconvex stochastic optimization
- Stochastic generalized gradient method for nonconvex nonsmooth stochastic optimization
- Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation
- On the Convergence of Stochastic Gradient Descent for Nonlinear Ill-Posed Problems
- On the discrepancy principle for stochastic gradient descent
Cites work
- scientific article; zbMATH DE number 1972910 (Why is no real title available?)
- A Stochastic Approximation Method
- A mean field view of the landscape of two-layer neural networks
- Accelerated gradient methods for nonconvex nonlinear and stochastic programming
- Handbook of simulation optimization
- Some methods of speeding up the convergence of iteration methods
- Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming
- Stochastic approximation with two time scales
- Stochastic differential equations. An introduction with applications.
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
Cited in
(15)- Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
- Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation
- On large batch training and sharp minima: a Fokker-Planck perspective
- Conjugate-gradient-based Adam for nonconvex stochastic optimization and its application to deep learning
- Switched diffusion processes for non-convex optimization and saddle points search
- Convergence and dynamical behavior of the ADAM algorithm for nonconvex stochastic optimization
- Stochastic momentum methods for non-convex learning without bounded assumptions
- Classical algebraic geometry. Abstracts from the workshop held June 20--26, 2021 (hybrid meeting)
- On the diffusion approximation of nonconvex stochastic gradient descent
- Convergence of the RMSProp deep learning method with penalty for nonconvex optimization
- Stochastic gradient Hamiltonian Monte Carlo for non-convex learning
- On Nonconvex Optimization for Machine Learning
- Global convergence of stochastic gradient Hamiltonian Monte Carlo for nonconvex stochastic optimization: nonasymptotic performance bounds and momentum-based acceleration
- On the influence of momentum acceleration on online learning
- Convergence of the Momentum Method for Semialgebraic Functions with Locally Lipschitz Gradients
This page was built for publication: A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5084492)