First-order methods almost always avoid strict saddle points

From MaRDI portal
Publication:2425175

DOI10.1007/s10107-019-01374-3zbMath1415.90089arXiv1710.07406OpenAlexW2915116423MaRDI QIDQ2425175

Max Simchowitz, Benjamin Recht, Georgios Piliouras, Ioannis Panageas, Michael I. Jordan, Jason D. Lee

Publication date: 26 June 2019

Published in: Mathematical Programming. Series A. Series B (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1710.07406



Related Items

Model-free Nonconvex Matrix Completion: Local Minima Analysis and Applications in Memory-efficient Kernel PCA, On initial point selection of the steepest descent algorithm for general quadratic functions, Proximal methods avoid active strict saddles of weakly convex functions, Fast Cluster Detection in Networks by First Order Optimization, Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions, Unnamed Item, A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions, Column $\ell_{2,0}$-Norm Regularized Factorization Model of Low-Rank Matrix Recovery and Its Computation, Escaping Strict Saddle Points of the Moreau Envelope in Nonsmooth Optimization, Linear convergence of an alternating polar decomposition method for low rank orthogonal tensor approximations, Global convergence of the gradient method for functions definable in o-minimal structures, Convergence of the Momentum Method for Semialgebraic Functions with Locally Lipschitz Gradients, An envelope for Davis-Yin splitting and strict saddle-point avoidance, A geometric approach of gradient descent algorithms in linear neural networks, Unnamed Item, Exploiting negative curvature in deterministic and stochastic optimization, On the Global Convergence of Randomized Coordinate Gradient Descent for Nonconvex Optimization, Provable Phase Retrieval with Mirror Descent, Weighted Trace-Penalty Minimization for Full Configuration Interaction, Second-Order Guarantees of Distributed Gradient Algorithms, Stochastic optimization with momentum: convergence, fluctuations, and traps avoidance, The global optimization geometry of shallow linear neural networks, Unnamed Item, Regional complexity analysis of algorithms for nonconvex smooth optimization, Error bound of critical points and KL property of exponent 1/2 for squared F-norm regularized factorization, Consensus-based optimization on hypersurfaces: Well-posedness and mean-field limit, Unnamed Item, CoordinateWise Descent Methods for Leading Eigenvalue Problem, First-order Methods for the Impatient: Support Identification in Finite Time with Convergent Frank--Wolfe Variants, On the geometric analysis of a quartic-quadratic optimization problem under a spherical constraint, A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization: Superlinear Convergence to Nonisolated Local Minima, Triangularized orthogonalization-free method for solving extreme eigenvalue problems, Computing Symplectic Eigenpairs of Symmetric Positive-Definite Matrices via Trace Minimization and Riemannian Optimization, Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points


Uses Software


Cites Work