Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
From MaRDI portal
Publication:2233558
Abstract: In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.
Recommendations
- Convergence and dynamical behavior of the ADAM algorithm for nonconvex stochastic optimization
- A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization
- Publication:4862190
- Convergence rates for the stochastic gradient descent method for non-convex objective functions
- Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods
Cites work
- scientific article; zbMATH DE number 51724 (Why is no real title available?)
- scientific article; zbMATH DE number 194139 (Why is no real title available?)
- scientific article; zbMATH DE number 3449561 (Why is no real title available?)
- scientific article; zbMATH DE number 3449064 (Why is no real title available?)
- scientific article; zbMATH DE number 1405930 (Why is no real title available?)
- A differential equation for modeling Nesterov's accelerated gradient method: theory and insights
- A general system of differential equations to model first-order adaptive algorithms
- Adaptive subgradient methods for online learning and stochastic optimization
- Asymptotic pseudotrajectories and chain recurrent flows, with applications
- Convergence and dynamical behavior of the ADAM algorithm for nonconvex stochastic optimization
- Convergence of a stochastic approximation version of the EM algorithm
- Do stochastic algorithms avoid traps?
- Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity
- First-order methods almost always avoid strict saddle points
- Gradient descent only converges to minimizers: non-isolated critical points and invariant regions
- Nonautonomous dynamical systems
- Nonconvergence to unstable points in urn models and stochastic approximations
- On the Minimizing Property of a Second Order Dissipative System in Hilbert Spaces
- On the long time behavior of second order differential equations with asymptotically small dissipation
- Optimal convergence rates for Nesterov acceleration
- Ordinary differential equations.
- Stochastic heavy ball
- THE HEAVY BALL WITH FRICTION METHOD, I. THE CONTINUOUS DYNAMICAL SYSTEM: GLOBAL EXPLORATION OF THE LOCAL MINIMA OF A REAL-VALUED FUNCTION BY ASYMPTOTIC ANALYSIS OF A DISSIPATIVE DYNAMICAL SYSTEM
- Taylor approximation of integral manifolds
- Théorèmes de convergence presque sure pour une classe d'algorithmes stochastiques à pas decroissant
- Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing
Cited in
(12)- A Convergence Study of SGD-Type Methods for Stochastic Optimization
- Stochastic modified equations and dynamics of stochastic gradient algorithms. I: Mathematical foundations
- SGEM: stochastic gradient with energy and momentum
- Switched diffusion processes for non-convex optimization and saddle points search
- Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods
- Convergence and dynamical behavior of the ADAM algorithm for nonconvex stochastic optimization
- A general system of differential equations to model first-order adaptive algorithms
- Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization
- Risk-Sensitive Reinforcement Learning via Policy Gradient Search
- An adaptive gradient method with energy and momentum
- Nonlinear Gradient Mappings and Stochastic Optimization: A General Framework with Applications to Heavy-Tail Noise
- Global convergence of stochastic gradient Hamiltonian Monte Carlo for nonconvex stochastic optimization: nonasymptotic performance bounds and momentum-based acceleration
This page was built for publication: Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2233558)