A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

From MaRDI portal
Publication:6364310

DOI10.1007/S00033-022-01716-WarXiv2104.00277WikidataQ113906263 ScholiaQ113906263MaRDI QIDQ6364310FDOQ6364310


Authors: Arnulf Jentzen, Adrian Riekert Edit this on Wikidata


Publication date: 1 April 2021

Abstract: In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with dinmathbbN neurons on the input layer, HinmathbbN neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.













This page was built for publication: A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6364310)