Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks

DOI10.1109/TIT.2018.2854560zbMath1428.68255arXiv1707.04926OpenAlexW2963417959WikidataQ129563058 ScholiaQ129563058MaRDI QIDQ4615339

Jason D. Lee, Adel Javanmard, Mahdi Soltanolkotabi

Publication date: 28 January 2019

Published in: IEEE Transactions on Information Theory (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1707.04926

Mathematics Subject Classification ID

Artificial neural networks and deep learning (68T07) Nonconvex programming, global optimization (90C26) Learning and adaptive systems in artificial intelligence (68T05)

Related Items (32)

On PDE Characterization of Smooth Hierarchical Functions Computed by Neural Networks ⋮ Spurious Valleys in Two-layer Neural Network Optimization Landscapes ⋮ Loss landscapes and optimization in over-parameterized non-linear systems and neural networks ⋮ Align, then memorise: the dynamics of learning with feedback alignment* ⋮ Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity ⋮ Align, then memorise: the dynamics of learning with feedback alignment* ⋮ Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions ⋮ Exploiting layerwise convexity of rectifier networks with sign constrained weights ⋮ On the Benefit of Width for Neural Networks: Disappearance of Basins ⋮ Simultaneous neural network approximation for smooth functions ⋮ Non-differentiable saddle points and sub-optimal local minima exist for deep ReLU networks ⋮ Implicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution ⋮ Unnamed Item ⋮ Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval ⋮ First-order methods almost always avoid strict saddle points ⋮ Utility/privacy trade-off as regularized optimal transport ⋮ Optimization for deep learning: an overview ⋮ Recent Theoretical Advances in Non-Convex Optimization ⋮ Applied harmonic analysis and data processing. Abstracts from the workshop held March 25--31, 2018 ⋮ Unnamed Item ⋮ Analysis of a two-layer neural network via displacement convexity ⋮ Non-convergence of stochastic gradient descent in the training of deep neural networks ⋮ Principal Component Analysis by Optimization of Symmetric Functions has no Spurious Local Optima ⋮ On the Landscape of Synchronization Networks: A Perspective from Nonconvex Optimization ⋮ Symmetry \& critical points for a model shallow neural network ⋮ Neural ODEs as the deep limit of ResNets with constant weights ⋮ Unnamed Item ⋮ Stable recovery of entangled weights: towards robust identification of deep neural networks from minimal samples ⋮ The interpolation phase transition in neural networks: memorization and generalization under lazy training ⋮ Suboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations ⋮ Extending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle Points ⋮ Solving phase retrieval with random initial guess is nearly as good as by spectral initialization

This page was built for publication: Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks