Strong error analysis for stochastic gradient descent optimization algorithms
From MaRDI portal
Publication:4964091
Abstract: Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small and every arbitrarily large that the considered SGD optimization algorithm converges in the strong -sense with order to the global minimum of the objective function of the considered stochastic approximation problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures, and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large strong -convergence rates. This article also contains an extensive review of results on SGD optimization algorithms in the scientific literature.
Recommendations
- Lower error bounds for the stochastic gradient descent optimization algorithm: sharp convergence rates for slowly and fast decaying learning rates
- Stochastic gradient descent with noise of machine learning type. I: Discrete time analysis
- New Convergence Aspects of Stochastic Gradient Algorithms
- Convergence rates for the stochastic gradient descent method for non-convex objective functions
- Convergence of stochastic gradient descent in deep neural network
Cited in
(17)- Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations
- A Convergence Study of SGD-Type Methods for Stochastic Optimization
- Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation
- A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
- Performance analysis of stochastic gradient algorithms under weak conditions
- Stability and optimization error of stochastic gradient descent for pairwise learning
- Analysis of biased stochastic gradient descent using sequential semidefinite programs
- Backward error analysis and the qualitative behaviour of stochastic optimization algorithms: application to stochastic coordinate descent
- Concentration inequalities for additive functionals: a martingale approach
- Convergence of stochastic gradient descent in deep neural network
- Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
- Sublinear convergence of a tamed stochastic gradient descent method in Hilbert space
- Analysis of stochastic gradient descent in continuous time
- Full error analysis for the training of deep neural networks
- Stochastic gradient descent with noise of machine learning type. I: Discrete time analysis
- Lower error bounds for the stochastic gradient descent optimization algorithm: sharp convergence rates for slowly and fast decaying learning rates
- A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
This page was built for publication: Strong error analysis for stochastic gradient descent optimization algorithms
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4964091)