Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions
From MaRDI portal
Publication:2156337
DOI10.1007/S00332-022-09823-8zbMATH Open1491.68179arXiv2103.10922OpenAlexW4284712609MaRDI QIDQ2156337FDOQ2156337
Patrick Cheridito, Florian Rossmannek, Arnulf Jentzen
Publication date: 18 July 2022
Published in: Journal of Nonlinear Science (Search for Journal in Brave)
Abstract: In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. In particular, we show that there exist no local maxima and clarify the structure of saddle points. Moreover, we prove that non-global local minima can only be caused by `dead' ReLU neurons. In particular, they do not appear in the case of leaky ReLU or quadratic activation. Our approach is of a combinatorial nature and builds on a careful analysis of the different types of hidden neurons that can occur.
Full work available at URL: https://arxiv.org/abs/2103.10922
Recommendations
- Symmetry \& critical points for a model shallow neural network
- Optimization Landscape of Neural Networks
- The global optimization geometry of shallow linear neural networks
- Non-differentiable saddle points and sub-optimal local minima exist for deep ReLU networks
- Critical points for least-squares problems involving certain analytic functions, with applications to sigmoidal nets
Artificial neural networks and deep learning (68T07) Nonconvex programming, global optimization (90C26)
Cites Work
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
- First-order methods almost always avoid strict saddle points
- Topological properties of the set of functions generated by neural networks of fixed size
- Title not available (Why is that?)
- A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
- Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Cited In (7)
- Non-differentiable saddle points and sub-optimal local minima exist for deep ReLU networks
- Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation
- Gradient descent provably escapes saddle points in the training of shallow ReLU networks
- On the existence of minimizers in shallow residual ReLU neural network optimization landscapes
- Optimization Landscape of Neural Networks
- Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity
- A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
This page was built for publication: Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2156337)