On the benefit of width for neural networks: disappearance of basins
From MaRDI portal
Publication:5097010
DOI10.1137/21M1394205zbMATH Open1493.68331arXiv1812.11039OpenAlexW4289334798MaRDI QIDQ5097010FDOQ5097010
Authors: Dawei Li, Tian Ding, Ruoyu Sun
Publication date: 19 August 2022
Published in: SIAM Journal on Optimization (Search for Journal in Brave)
Abstract: Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove? To understand the benefit of width, it is important to identify the difference between wide and narrow networks. In this work, we prove that from narrow to wide networks, there is a phase transition from having sub-optimal basins to no sub-optimal basins. Specifically, we prove two results: on the positive side, for any continuous activation functions, the loss surface of a class of wide networks has no sub-optimal basins, where "basin" is defined as the set-wise strict local minimum; on the negative side, for a large class of networks with width below a threshold, we construct strict local minima that are not global. These two results together show the phase transition from narrow to wide networks.
Full work available at URL: https://arxiv.org/abs/1812.11039
Recommendations
- Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations
- Effect of depth and width on local minima in deep learning
- Optimization Landscape of Neural Networks
- The global optimization geometry of shallow linear neural networks
- Shaping the learning landscape in neural networks around wide flat minima
Cites Work
- Approximation by entire functions
- A mean field view of the landscape of two-layer neural networks
- Reconciling modern machine-learning practice and the classical bias–variance trade-off
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- Gradient descent optimizes over-parameterized deep ReLU networks
- Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization
- Symmetry \& critical points for a model shallow neural network
- The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
- Global Minima of Overparameterized Neural Networks
- Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits
- Spurious Valleys in Two-layer Neural Network Optimization Landscapes
- Mean Field Analysis of Deep Neural Networks
Cited In (4)
- Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations
- Wide neural networks of any depth evolve as linear models under gradient descent *
- Certifying the Absence of Spurious Local Minima at Infinity
- On the omnipresence of spurious local minima in certain neural network training problems
This page was built for publication: On the benefit of width for neural networks: disappearance of basins
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5097010)