Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points
From MaRDI portal
Publication:5027014
Abstract: We provide larger step-size restrictions for which gradient descent based algorithms (almost surely) avoid strict saddle points. In particular, consider a twice differentiable (non-convex) objective function whose gradient has Lipschitz constant L and whose Hessian is well-behaved. We prove that the probability of initial conditions for gradient descent with step-size up to 2/L converging to a strict saddle point, given one uniformly random initialization, is zero. This extends previous results up to the sharp limit imposed by the convex case. In addition, the arguments hold in the case when a learning rate schedule is given, with either a continuous decaying rate or a piece-wise constant schedule.
Recommendations
- A generalization of s-step variants of gradient methods
- On the steplength selection in gradient methods for unconstrained optimization
- New stepsizes for the gradient method
- On the steplength selection in stochastic gradient methods
- Revisiting Normalized Gradient Descent: Fast Evasion of Saddle Points
- A new gradient method with an optimal stepsize property
- Gradient methods with adaptive step-sizes
- An efficient gradient method with approximate optimal stepsize for large-scale unconstrained optimization
- Two novel gradient methods with optimal step sizes
- On the behavior of the gradient norm in the steepest descent method
Cites work
- A geometric analysis of phase retrieval
- Accelerated methods for nonconvex optimization
- Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture
- Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method
- Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization
- Convergence of the Iterates of Descent Methods for Analytic Cost Functions
- Cubic regularization of Newton method and its global performance
- First-order methods almost always avoid strict saddle points
- Gradient descent only converges to minimizers: non-isolated critical points and invariant regions
- Introductory lectures on convex optimization. A basic course.
- Nonconvergence to unstable points in urn models and stochastic approximations
- On the use of directions of negative curvature in a modified newton method
- The zero set of a real analytic function
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- Trust Region Methods
Cited in
(5)- A deterministic gradient-based approach to avoid saddle points
- First-order methods almost always avoid strict saddle points
- Analysis of asymptotic escape of strict saddle sets in manifold optimization
- Gradient descent only converges to minimizers: non-isolated critical points and invariant regions
- Behavior of accelerated gradient methods near critical points of nonconvex functions
This page was built for publication: Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5027014)