Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks

From MaRDI portal
Publication:4615339

DOI10.1109/TIT.2018.2854560zbMath1428.68255arXiv1707.04926OpenAlexW2963417959MaRDI QIDQ4615339

Jason D. Lee, Adel Javanmard, Mahdi Soltanolkotabi

Publication date: 28 January 2019

Published in: IEEE Transactions on Information Theory (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1707.04926




Related Items (32)

On PDE Characterization of Smooth Hierarchical Functions Computed by Neural NetworksSpurious Valleys in Two-layer Neural Network Optimization LandscapesLoss landscapes and optimization in over-parameterized non-linear systems and neural networksAlign, then memorise: the dynamics of learning with feedback alignment*Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to InfinityAlign, then memorise: the dynamics of learning with feedback alignment*Landscape analysis for shallow neural networks: complete classification of critical points for affine target functionsExploiting layerwise convexity of rectifier networks with sign constrained weightsOn the Benefit of Width for Neural Networks: Disappearance of BasinsSimultaneous neural network approximation for smooth functionsNon-differentiable saddle points and sub-optimal local minima exist for deep ReLU networksImplicit regularization in nonconvex statistical estimation: gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolutionUnnamed ItemGradient descent with random initialization: fast global convergence for nonconvex phase retrievalFirst-order methods almost always avoid strict saddle pointsUtility/privacy trade-off as regularized optimal transportOptimization for deep learning: an overviewRecent Theoretical Advances in Non-Convex OptimizationApplied harmonic analysis and data processing. Abstracts from the workshop held March 25--31, 2018Unnamed ItemAnalysis of a two-layer neural network via displacement convexityNon-convergence of stochastic gradient descent in the training of deep neural networksPrincipal Component Analysis by Optimization of Symmetric Functions has no Spurious Local OptimaOn the Landscape of Synchronization Networks: A Perspective from Nonconvex OptimizationSymmetry \& critical points for a model shallow neural networkNeural ODEs as the deep limit of ResNets with constant weightsUnnamed ItemStable recovery of entangled weights: towards robust identification of deep neural networks from minimal samplesThe interpolation phase transition in neural networks: memorization and generalization under lazy trainingSuboptimal Local Minima Exist for Wide Neural Networks with Smooth ActivationsExtending the Step-Size Restriction for Gradient Descent to Avoid Strict Saddle PointsSolving phase retrieval with random initial guess is nearly as good as by spectral initialization




This page was built for publication: Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks