First-order methods almost always avoid strict saddle points
From MaRDI portal
Publication:2425175
DOI10.1007/S10107-019-01374-3zbMATH Open1415.90089arXiv1710.07406OpenAlexW2915116423MaRDI QIDQ2425175FDOQ2425175
Max Simchowitz, Benjamin Recht, Georgios Piliouras, Ioannis Panageas, Michael Jordan, Jason D. Lee
Publication date: 26 June 2019
Published in: Mathematical Programming. Series A. Series B (Search for Journal in Brave)
Abstract: We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold Theorem allow for a global stability analysis. Thus, neither access to second-order derivative information nor randomness beyond initialization is necessary to provably avoid saddle points.
Full work available at URL: https://arxiv.org/abs/1710.07406
Cites Work
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Phase Retrieval via Wirtinger Flow: Theory and Algorithms
- Introductory lectures on convex optimization. A basic course.
- First-Order Methods in Optimization
- Optimization
- Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality
- Some NP-complete problems in quadratic and nonlinear programming
- Proximal alternating linearized minimization for nonconvex and nonsmooth problems
- Trust Region Methods
- Matrix Completion From a Few Entries
- Random Fields and Geometry
- Convergence of the Iterates of Descent Methods for Analytic Cost Functions
- Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods
- Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
- Newton-type methods for unconstrained and linearly constrained optimization
- Characterizations of Łojasiewicz inequalities: Subgradient flows, talweg, convexity
- Alternating Projections on Manifolds
- Multiplicative updates outperform generic no-regret learning in congestion games
- Dynamics of games and genes: Discrete versus continuous time
- Random Matrices and Complexity of Spin Glasses
- An Extrinsic Look at the Riemannian Hessian
- Nonconvergence to unstable points in urn models and stochastic approximations
- Cubic regularization of Newton method and its global performance
- Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow
- Projection-like retractions on matrix manifolds
- Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture
- Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method
- On the use of directions of negative curvature in a modified newton method
- Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization
- First Order Methods Beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
Cited In (42)
- The global optimization geometry of shallow linear neural networks
- CoordinateWise Descent Methods for Leading Eigenvalue Problem
- Triangularized orthogonalization-free method for solving extreme eigenvalue problems
- Title not available (Why is that?)
- Escaping Strict Saddle Points of the Moreau Envelope in Nonsmooth Optimization
- Exploiting negative curvature in deterministic and stochastic optimization
- On the convergence of orthogonalization-free conjugate gradient method for extreme eigenvalues of Hermitian matrices: a Riemannian optimization interpretation
- Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance
- A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization: Superlinear Convergence to Nonisolated Local Minima
- A geometric approach of gradient descent algorithms in linear neural networks
- Second-Order Guarantees of Distributed Gradient Algorithms
- Global convergence of the gradient method for functions definable in o-minimal structures
- Provable Phase Retrieval with Mirror Descent
- Fast Cluster Detection in Networks by First Order Optimization
- Regional complexity analysis of algorithms for nonconvex smooth optimization
- Model-free Nonconvex Matrix Completion: Local Minima Analysis and Applications in Memory-efficient Kernel PCA
- On initial point selection of the steepest descent algorithm for general quadratic functions
- Proximal methods avoid active strict saddles of weakly convex functions
- First-order Methods for the Impatient: Support Identification in Finite Time with Convergent Frank--Wolfe Variants
- Scalable symmetric Tucker tensor decomposition
- Computing Symplectic Eigenpairs of Symmetric Positive-Definite Matrices via Trace Minimization and Riemannian Optimization
- Weighted Trace-Penalty Minimization for Full Configuration Interaction
- Convergence of the Momentum Method for Semialgebraic Functions with Locally Lipschitz Gradients
- A deterministic gradient-based approach to avoid saddle points
- Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points
- Title not available (Why is that?)
- Error bound of critical points and KL property of exponent 1/2 for squared F-norm regularized factorization
- Consensus-based optimization on hypersurfaces: Well-posedness and mean-field limit
- Gradient descent provably escapes saddle points in the training of shallow ReLU networks
- Title not available (Why is that?)
- Variance-reduced reshuffling gradient descent for nonconvex optimization: centralized and distributed algorithms
- Linear convergence of an alternating polar decomposition method for low rank orthogonal tensor approximations
- On the Global Convergence of Randomized Coordinate Gradient Descent for Nonconvex Optimization
- The effect of smooth parametrizations on nonconvex optimization landscapes
- Kurdyka-Łojasiewicz exponent via Hadamard parametrization
- Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions
- A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
- Learning weakly convex regularizers for convergent image-reconstruction algorithms
- Title not available (Why is that?)
- On the geometric analysis of a quartic-quadratic optimization problem under a spherical constraint
- An envelope for Davis-Yin splitting and strict saddle-point avoidance
- Column $\ell_{2,0}$-Norm Regularized Factorization Model of Low-Rank Matrix Recovery and Its Computation
Uses Software
This page was built for publication: First-order methods almost always avoid strict saddle points
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2425175)