Gradient descent optimizes over-parameterized deep ReLU networks

From MaRDI portal
Publication:2183586

DOI10.1007/s10994-019-05839-6zbMath1494.68245OpenAlexW2981407587WikidataQ126992055 ScholiaQ126992055MaRDI QIDQ2183586

Dongruo Zhou, Yuan Cao, Difan Zou, Quanquan Gu

Publication date: 27 May 2020

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1007/s10994-019-05839-6




Related Items (43)

Memory Capacity of Neural Networks with Threshold and Rectified Linear Unit ActivationsEffects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networksDeep learning: a statistical viewpointSurprises in high-dimensional ridgeless least squares interpolationGeneralization error of random feature and kernel methods: hypercontractivity and kernel matrix concentrationLoss landscapes and optimization in over-parameterized non-linear systems and neural networksRevisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to InfinityParticle dual averaging: optimization of mean field neural network with global convergence rate analysis*A proof of convergence for gradient descent in the training of artificial neural networks for constant target functionsBenign overfitting in linear regressionA proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functionsFull error analysis for the training of deep neural networksGradient descent optimizes over-parameterized deep ReLU networksOn the Benefit of Width for Neural Networks: Disappearance of BasinsTowards interpreting deep neural networks via layer behavior understandingDeep learning in random neural fields: numerical experiments via neural tangent kernelNon-differentiable saddle points and sub-optimal local minima exist for deep ReLU networksBlack holes and the loss landscape in machine learningA rigorous framework for the mean field limit of multilayer neural networksOverall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisationConvergence rates for shallow neural networks learned by gradient descentOn stochastic roundoff errors in gradient descent with low-precision computationA comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamicsFedHD: communication-efficient federated learning from hybrid dataGrowing axons: greedy learning of neural networks with application to function approximationNormalization effects on deep neural networksGreedy training algorithms for neural networks and applications to PDEsOptimization for deep learning: an overviewPlateau Phenomenon in Gradient Descent Training of RELU Networks: Explanation, Quantification, and AvoidanceUnnamed ItemNon-convergence of stochastic gradient descent in the training of deep neural networksLinearized two-layers neural networks in high dimensionGradient convergence of deep learning-based numerical methods for BSDEsEvery Local Minimum Value Is the Global Minimum Value of Induced Model in Nonconvex Machine LearningOn the Effect of the Activation Function on the Distribution of Hidden Nodes in a Deep NetworkNormalization effects on shallow neural networks and related asymptotic expansionsWide neural networks of any depth evolve as linear models under gradient descent *Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup*Unnamed ItemStabilize deep ResNet with a sharp scaling factor \(\tau\)The interpolation phase transition in neural networks: memorization and generalization under lazy trainingProvably training overparameterized neural network classifiers with non-convex constraintsSuboptimal Local Minima Exist for Wide Neural Networks with Smooth Activations


Uses Software


Cites Work


This page was built for publication: Gradient descent optimizes over-parameterized deep ReLU networks