Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
DOI10.1214/21-EJS1880zbMath1471.62442arXiv2012.04002OpenAlexW3192961449MaRDI QIDQ2233558
Anas Barakat, Pascal Bianchi, Walid Hachem, Sholom Schechtman
Publication date: 11 October 2021
Published in: Electronic Journal of Statistics (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2012.04002
dynamical systemsstochastic approximationADAMadaptive gradient methods with momentumavoidance of trapsNesterov accelerated gradient
Initial value problems, existence, uniqueness, continuous dependence and continuation of solutions to ordinary differential equations (34A12) Stochastic approximation (62L20) Artificial intelligence (68T99) Limit theorems in probability theory (60F99)
Related Items
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Nonconvergence to unstable points in urn models and stochastic approximations
- Théorèmes de convergence presque sure pour une classe d'algorithmes stochastiques à pas decroissant
- Weak convergence rates for stochastic approximation with application to multiple targets and simulated annealing
- Stochastic heavy ball
- Convergence of a stochastic approximation version of the EM algorithm
- Asymptotic pseudotrajectories and chain recurrent flows, with applications
- Do stochastic algorithms avoid traps?
- Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity
- First-order methods almost always avoid strict saddle points
- Taylor approximation of integral manifolds
- A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
- On the long time behavior of second order differential equations with asymptotically small dissipation
- On the Minimizing Property of a Second Order Dissipative System in Hilbert Spaces
- THE HEAVY BALL WITH FRICTION METHOD, I. THE CONTINUOUS DYNAMICAL SYSTEM: GLOBAL EXPLORATION OF THE LOCAL MINIMA OF A REAL-VALUED FUNCTION BY ASYMPTOTIC ANALYSIS OF A DISSIPATIVE DYNAMICAL SYSTEM
- Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
- Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization
- Optimal Convergence Rates for Nesterov Acceleration
- Ordinary Differential Equations