Strong error analysis for stochastic gradient descent optimization algorithms

From MaRDI portal
Publication:4964091

DOI10.1093/IMANUM/DRZ055zbMATH Open1460.65071arXiv1801.09324OpenAlexW2786773456WikidataQ126576349 ScholiaQ126576349MaRDI QIDQ4964091FDOQ4964091


Authors: Arnulf Jentzen, Benno Kuckuck, Ariel Neufeld, Philippe von Wurstemberger Edit this on Wikidata


Publication date: 24 February 2021

Published in: IMA Journal of Numerical Analysis (Search for Journal in Brave)

Abstract: Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small varepsilonin(0,infty) and every arbitrarily large pin(0,infty) that the considered SGD optimization algorithm converges in the strong Lp-sense with order frac12varepsilon to the global minimum of the objective function of the considered stochastic approximation problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures, and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large pin(0,infty) strong Lp-convergence rates. This article also contains an extensive review of results on SGD optimization algorithms in the scientific literature.


Full work available at URL: https://arxiv.org/abs/1801.09324




Recommendations





Cited In (15)





This page was built for publication: Strong error analysis for stochastic gradient descent optimization algorithms

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4964091)