Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
From MaRDI portal
Publication:4615339
Abstract: In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This result holds for an arbitrary training data of input/output pairs. For differentiable activation functions we also show that gradient descent, when suitably initialized, converges at a linear rate to a globally optimal model. This result focuses on a realizable model where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted weight coefficients.
Cited in
(36)- On PDE characterization of smooth hierarchical functions computed by neural networks
- Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations
- Stable recovery of entangled weights: towards robust identification of deep neural networks from minimal samples
- The interpolation phase transition in neural networks: memorization and generalization under lazy training
- Non-differentiable saddle points and sub-optimal local minima exist for deep ReLU networks
- Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval
- Analysis of a two-layer neural network via displacement convexity
- First-order methods almost always avoid strict saddle points
- Align, then memorise: the dynamics of learning with feedback alignment
- Solving phase retrieval with random initial guess is nearly as good as by spectral initialization
- Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution
- Non-convergence of stochastic gradient descent in the training of deep neural networks
- Uncertainty quantification of graph convolution neural network models of evolving processes
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
- scientific article; zbMATH DE number 7307488 (Why is no real title available?)
- Spurious valleys in one-hidden-layer neural network optimization landscapes
- On the benefit of width for neural networks: disappearance of basins
- Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points
- Neural ODEs as the deep limit of ResNets with constant weights
- Utility/privacy trade-off as regularized optimal transport
- Exploiting layerwise convexity of rectifier networks with sign constrained weights
- Gradient descent provably escapes saddle points in the training of shallow ReLU networks
- scientific article; zbMATH DE number 7626727 (Why is no real title available?)
- Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity
- Align, then memorise: the dynamics of learning with feedback alignment*
- scientific article; zbMATH DE number 7625201 (Why is no real title available?)
- Simultaneous neural network approximation for smooth functions
- Symmetry \& critical points for a model shallow neural network
- Recent Theoretical Advances in Non-Convex Optimization
- On the landscape of synchronization networks: a perspective from nonconvex optimization
- Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions
- scientific article; zbMATH DE number 7370585 (Why is no real title available?)
- Applied harmonic analysis and data processing. Abstracts from the workshop held March 25--31, 2018
- Optimization for deep learning: an overview
- Principal Component Analysis by Optimization of Symmetric Functions has no Spurious Local Optima
- The curse of overparametrization in adversarial training: precise analysis of robust generalization for random features regression
This page was built for publication: Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4615339)