First-order methods almost always avoid strict saddle points
From MaRDI portal
Publication:2425175
DOI10.1007/s10107-019-01374-3zbMath1415.90089arXiv1710.07406OpenAlexW2915116423MaRDI QIDQ2425175
Max Simchowitz, Benjamin Recht, Georgios Piliouras, Ioannis Panageas, Michael I. Jordan, Jason D. Lee
Publication date: 26 June 2019
Published in: Mathematical Programming. Series A. Series B (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1710.07406
Related Items
Model-free Nonconvex Matrix Completion: Local Minima Analysis and Applications in Memory-efficient Kernel PCA, On initial point selection of the steepest descent algorithm for general quadratic functions, Proximal methods avoid active strict saddles of weakly convex functions, Fast Cluster Detection in Networks by First Order Optimization, Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions, Unnamed Item, A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions, Column $\ell_{2,0}$-Norm Regularized Factorization Model of Low-Rank Matrix Recovery and Its Computation, Escaping Strict Saddle Points of the Moreau Envelope in Nonsmooth Optimization, Linear convergence of an alternating polar decomposition method for low rank orthogonal tensor approximations, Global convergence of the gradient method for functions definable in o-minimal structures, Convergence of the Momentum Method for Semialgebraic Functions with Locally Lipschitz Gradients, An envelope for Davis-Yin splitting and strict saddle-point avoidance, A geometric approach of gradient descent algorithms in linear neural networks, Unnamed Item, Exploiting negative curvature in deterministic and stochastic optimization, On the Global Convergence of Randomized Coordinate Gradient Descent for Nonconvex Optimization, Provable Phase Retrieval with Mirror Descent, Weighted Trace-Penalty Minimization for Full Configuration Interaction, Second-Order Guarantees of Distributed Gradient Algorithms, Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance, The global optimization geometry of shallow linear neural networks, Unnamed Item, Regional complexity analysis of algorithms for nonconvex smooth optimization, Error bound of critical points and KL property of exponent 1/2 for squared F-norm regularized factorization, Consensus-based optimization on hypersurfaces: Well-posedness and mean-field limit, Unnamed Item, CoordinateWise Descent Methods for Leading Eigenvalue Problem, First-order Methods for the Impatient: Support Identification in Finite Time with Convergent Frank--Wolfe Variants, On the geometric analysis of a quartic-quadratic optimization problem under a spherical constraint, A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization: Superlinear Convergence to Nonisolated Local Minima, Triangularized orthogonalization-free method for solving extreme eigenvalue problems, Computing Symplectic Eigenpairs of Symmetric Positive-Definite Matrices via Trace Minimization and Riemannian Optimization, Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow
- Proximal alternating linearized minimization for nonconvex and nonsmooth problems
- Nonconvergence to unstable points in urn models and stochastic approximations
- Dynamics of games and genes: Discrete versus continuous time
- Introductory lectures on convex optimization. A basic course.
- Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods
- Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
- Cubic regularization of Newton method and its global performance
- Projection-like Retractions on Matrix Manifolds
- Phase Retrieval via Wirtinger Flow: Theory and Algorithms
- Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture
- Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method
- Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality
- Characterizations of Łojasiewicz inequalities: Subgradient flows, talweg, convexity
- Some NP-complete problems in quadratic and nonlinear programming
- Newton-type methods for unconstrained and linearly constrained optimization
- On the use of directions of negative curvature in a modified newton method
- Trust Region Methods
- First Order Methods Beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems
- First-Order Methods in Optimization
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
- Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization
- Random Matrices and Complexity of Spin Glasses
- Multiplicative updates outperform generic no-regret learning in congestion games
- Matrix Completion From a Few Entries
- Random Fields and Geometry
- Alternating Projections on Manifolds
- An Extrinsic Look at the Riemannian Hessian
- Convergence of the Iterates of Descent Methods for Analytic Cost Functions
- Optimization