Bridging the gap between constant step size stochastic gradient descent and Markov chains (Q2196224): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
RedirectionBot (talk | contribs)
Removed claim: reviewed by (P1447): Item:Q450749
Property / reviewed by
 
Property / reviewed by: Q749029 / rank
Normal rank
 

Revision as of 06:59, 15 February 2024

scientific article
Language Label Description Also known as
English
Bridging the gap between constant step size stochastic gradient descent and Markov chains
scientific article

    Statements

    Bridging the gap between constant step size stochastic gradient descent and Markov chains (English)
    0 references
    0 references
    0 references
    0 references
    28 August 2020
    0 references
    The paper deals with the minimization algorithm of a strongly convex objective function given access to unbiased estimates of its gradient through \textit{Stochastic Gradient Descent} (SGD), aka Robbins-Monro algorithm [\textit{H. Robbins} and \textit{S. Monro}, Ann. Math. Stat. 22, 400--407 (1951; Zbl 0054.05901)], with constant step size. As the detailed analysis was only performed for quadratic functions, the authors provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the dependence on initial conditions, the effect of noise and the step size, as well as the lack of convergence in the general (nonquadratic) case. This analysis is based on tools from Markov chain theory into the analysis of stochastic gradient. It is observed that Richardson-Romberg extrapolation [\textit{G. Pagès}, Monte Carlo Methods Appl. 13, No. 1, 37--70 (2007; Zbl 1119.65004); \textit{N. Frikha} and \textit{L. Huang}, Stochastic Processes Appl. 125, No. 11, 4066--4101 (2015; Zbl 1336.60137)] may be used to get closer to the global optimum. Empirical improvements of the new extrapolation scheme is shown. This methodological problem is of interest in different practical tasks appearing in large-scale machine learning, optimization and stochastic approximation.
    0 references
    stochastic gradient descent
    0 references
    Markov chains
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references