First-order methods almost always avoid strict saddle points
From MaRDI portal
Publication:2425175
Abstract: We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Thus, neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle points.
Recommendations
- Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points
- Proximal methods avoid active strict saddles of weakly convex functions
- A Newton-based method for nonconvex optimization with fast evasion of saddle points
- On Nonconvex Optimization for Machine Learning
- On the Global Convergence of Randomized Coordinate Gradient Descent for Nonconvex Optimization
Cites work
- scientific article; zbMATH DE number 3980052 (Why is no real title available?)
- scientific article; zbMATH DE number 5223994 (Why is no real title available?)
- Alternating Projections on Manifolds
- An Extrinsic Look at the Riemannian Hessian
- Characterizations of Łojasiewicz inequalities: Subgradient flows, talweg, convexity
- Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture
- Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method
- Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization
- Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods
- Convergence of the Iterates of Descent Methods for Analytic Cost Functions
- Cubic regularization of Newton method and its global performance
- Dynamics of games and genes: Discrete versus continuous time
- First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems
- First-order methods in optimization
- Gradient descent only converges to minimizers: non-isolated critical points and invariant regions
- Introductory lectures on convex optimization. A basic course.
- Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
- Matrix Completion From a Few Entries
- Multiplicative updates outperform generic no-regret learning in congestion games (extended abstract)
- Newton-type methods for unconstrained and linearly constrained optimization
- Nonconvergence to unstable points in urn models and stochastic approximations
- On the use of directions of negative curvature in a modified newton method
- Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow
- Optimization
- Phase retrieval via Wirtinger flow: theory and algorithms
- Projection-like retractions on matrix manifolds
- Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality
- Proximal alternating linearized minimization for nonconvex and nonsmooth problems
- Random Fields and Geometry
- Random matrices and complexity of spin glasses
- Some NP-complete problems in quadratic and nonlinear programming
- Spectral methods meet EM: a provably optimal algorithm for crowdsourcing
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- Trust Region Methods
Cited in
(42)- The global optimization geometry of shallow linear neural networks
- Triangularized orthogonalization-free method for solving extreme eigenvalue problems
- scientific article; zbMATH DE number 7626752 (Why is no real title available?)
- Stochastic nested variance reduction for nonconvex optimization
- Exploiting negative curvature in deterministic and stochastic optimization
- On the convergence of orthogonalization-free conjugate gradient method for extreme eigenvalues of Hermitian matrices: a Riemannian optimization interpretation
- Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
- A geometric approach of gradient descent algorithms in linear neural networks
- Escaping strict saddle points of the Moreau envelope in nonsmooth optimization
- Regional complexity analysis of algorithms for nonconvex smooth optimization
- Global convergence of the gradient method for functions definable in o-minimal structures
- Provable Phase Retrieval with Mirror Descent
- Column \(\ell_{2,0}\)-norm regularized factorization model of low-rank matrix recovery and its computation
- Coordinatewise descent methods for leading eigenvalue problem
- On initial point selection of the steepest descent algorithm for general quadratic functions
- Proximal methods avoid active strict saddles of weakly convex functions
- First-order Methods for the Impatient: Support Identification in Finite Time with Convergent Frank--Wolfe Variants
- Scalable symmetric Tucker tensor decomposition
- A Bregman forward-backward linesearch algorithm for nonconvex composite optimization: superlinear convergence to nonisolated local minima
- Weighted Trace-Penalty Minimization for Full Configuration Interaction
- Convergence of the Momentum Method for Semialgebraic Functions with Locally Lipschitz Gradients
- A deterministic gradient-based approach to avoid saddle points
- Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points
- scientific article; zbMATH DE number 7064064 (Why is no real title available?)
- Error bound of critical points and KL property of exponent 1/2 for squared F-norm regularized factorization
- Consensus-based optimization on hypersurfaces: Well-posedness and mean-field limit
- Second-order guarantees of distributed gradient algorithms
- Model-free nonconvex matrix completion: local minima analysis and applications in memory-efficient kernel PCA
- Gradient descent provably escapes saddle points in the training of shallow ReLU networks
- Fast cluster detection in networks by first order optimization
- Variance-reduced reshuffling gradient descent for nonconvex optimization: centralized and distributed algorithms
- Linear convergence of an alternating polar decomposition method for low rank orthogonal tensor approximations
- On the Global Convergence of Randomized Coordinate Gradient Descent for Nonconvex Optimization
- The effect of smooth parametrizations on nonconvex optimization landscapes
- Kurdyka-Łojasiewicz exponent via Hadamard parametrization
- Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions
- A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
- On the geometric analysis of a quartic-quadratic optimization problem under a spherical constraint
- Computing symplectic eigenpairs of symmetric positive-definite matrices via trace minimization and Riemannian optimization
- Learning weakly convex regularizers for convergent image-reconstruction algorithms
- An envelope for Davis-Yin splitting and strict saddle-point avoidance
- scientific article; zbMATH DE number 7370585 (Why is no real title available?)
This page was built for publication: First-order methods almost always avoid strict saddle points
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2425175)