Bridging the gap between constant step size stochastic gradient descent and Markov chains (Q2196224)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Bridging the gap between constant step size stochastic gradient descent and Markov chains |
scientific article |
Statements
Bridging the gap between constant step size stochastic gradient descent and Markov chains (English)
0 references
28 August 2020
0 references
The paper deals with the minimization algorithm of a strongly convex objective function given access to unbiased estimates of its gradient through \textit{Stochastic Gradient Descent} (SGD), aka Robbins-Monro algorithm [\textit{H. Robbins} and \textit{S. Monro}, Ann. Math. Stat. 22, 400--407 (1951; Zbl 0054.05901)], with constant step size. As the detailed analysis was only performed for quadratic functions, the authors provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the dependence on initial conditions, the effect of noise and the step size, as well as the lack of convergence in the general (nonquadratic) case. This analysis is based on tools from Markov chain theory into the analysis of stochastic gradient. It is observed that Richardson-Romberg extrapolation [\textit{G. Pagès}, Monte Carlo Methods Appl. 13, No. 1, 37--70 (2007; Zbl 1119.65004); \textit{N. Frikha} and \textit{L. Huang}, Stochastic Processes Appl. 125, No. 11, 4066--4101 (2015; Zbl 1336.60137)] may be used to get closer to the global optimum. Empirical improvements of the new extrapolation scheme is shown. This methodological problem is of interest in different practical tasks appearing in large-scale machine learning, optimization and stochastic approximation.
0 references
stochastic gradient descent
0 references
Markov chains
0 references
0 references
0 references
0 references
0 references
0 references
0 references