Non-convergence of stochastic gradient descent in the training of deep neural networks
From MaRDI portal
Publication:2034567
DOI10.1016/j.jco.2020.101540zbMath1494.65044arXiv2006.07075OpenAlexW3107512664MaRDI QIDQ2034567
Patrick Cheridito, Florian Rossmannek, Arnulf Jentzen
Publication date: 22 June 2021
Published in: Journal of Complexity (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2006.07075
machine learningempirical risk minimizationnon-convergencedeep neural networksstochastic gradient descend
Artificial neural networks and deep learning (68T07) Numerical optimization and variational techniques (65K10)
Related Items (6)
Stationary Density Estimation of Itô Diffusions Using Deep Learning ⋮ Solving high-dimensional Hamilton-Jacobi-Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space ⋮ A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions ⋮ A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions ⋮ Deep multimodal autoencoder for crack criticality assessment ⋮ Constructive deep ReLU neural network approximation
Cites Work
- Gradient descent optimizes over-parameterized deep ReLU networks
- A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- Optimization Methods for Large-Scale Machine Learning
This page was built for publication: Non-convergence of stochastic gradient descent in the training of deep neural networks