Gradient descent provably escapes saddle points in the training of shallow ReLU networks
From MaRDI portal
Publication:6655804
DOI10.1007/S10957-024-02513-3MaRDI QIDQ6655804FDOQ6655804
Authors: Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek
Publication date: 27 December 2024
Published in: Journal of Optimization Theory and Applications (Search for Journal in Brave)
Recommendations
- Gradient descent optimizes over-parameterized deep ReLU networks
- The global optimization geometry of shallow linear neural networks
- Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions
- Non-differentiable saddle points and sub-optimal local minima exist for deep ReLU networks
Numerical optimization and variational techniques (65K10) Artificial neural networks and deep learning (68T07) Nonconvex programming, global optimization (90C26)
Cites Work
- Measure theory and fine properties of functions
- Title not available (Why is that?)
- Convergence of the Iterates of Descent Methods for Analytic Cost Functions
- Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates
- Nonconvergence to unstable points in urn models and stochastic approximations
- Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture
- Nonconvex Robust Low-Rank Matrix Recovery
- A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
- A geometric analysis of phase retrieval
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- Gradient descent only converges to minimizers: non-isolated critical points and invariant regions
- First-order methods almost always avoid strict saddle points
- Stochastic subgradient method converges on tame functions
- Gradient descent optimizes over-parameterized deep ReLU networks
- The nonsmooth landscape of phase retrieval
- A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
- Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions
- Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers
- Spurious valleys in one-hidden-layer neural network optimization landscapes
- Behavior of accelerated gradient methods near critical points of nonconvex functions
Cited In (4)
This page was built for publication: Gradient descent provably escapes saddle points in the training of shallow ReLU networks
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6655804)