Gradient descent provably escapes saddle points in the training of shallow ReLU networks
From MaRDI portal
Publication:6655804
DOI10.1007/S10957-024-02513-3MaRDI QIDQ6655804FDOQ6655804
Florian Rossmannek, Patrick Cheridito, Arnulf Jentzen
Publication date: 27 December 2024
Published in: Journal of Optimization Theory and Applications (Search for Journal in Brave)
Numerical optimization and variational techniques (65K10) Artificial neural networks and deep learning (68T07) Nonconvex programming, global optimization (90C26)
Cites Work
- Title not available (Why is that?)
- Title not available (Why is that?)
- Convergence of the Iterates of Descent Methods for Analytic Cost Functions
- Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates
- Nonconvergence to unstable points in urn models and stochastic approximations
- Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture
- Nonconvex Robust Low-Rank Matrix Recovery
- A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
- A geometric analysis of phase retrieval
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
- First-order methods almost always avoid strict saddle points
- Stochastic subgradient method converges on tame functions
- Gradient descent optimizes over-parameterized deep ReLU networks
- The nonsmooth landscape of phase retrieval
- A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
- Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions
- Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers
- Spurious Valleys in Two-layer Neural Network Optimization Landscapes
- Behavior of accelerated gradient methods near critical points of nonconvex functions
Cited In (1)
This page was built for publication: Gradient descent provably escapes saddle points in the training of shallow ReLU networks
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6655804)