Nesterov's method with decreasing learning rate leads to accelerated stochastic gradient descent

arXiv1908.07861MaRDI QIDQ6323985FDOQ6323985

Publication date: 21 August 2019

Abstract: We present a coupled system of ODEs which, when discretized with a constant time step/learning rate, recovers Nesterov's accelerated gradient descent algorithm. The same ODEs, when discretized with a decreasing learning rate, leads to novel stochastic gradient descent (SGD) algorithms, one in the convex and a second in the strongly convex case. In the strongly convex case, we obtain an algorithm superficially similar to momentum SGD, but with additional terms. In the convex case, we obtain an algorithm with a novel order

k^{3 / 4}

learning rate. We prove, extending the Lyapunov function approach from the full gradient case to the stochastic case, that the algorithms converge at the optimal rate for the last iterate of SGD, with rate constants which are better than previously available.

This page was built for publication: Nesterov's method with decreasing learning rate leads to accelerated stochastic gradient descent

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6323985)