Stochastic gradient descent with Polyak's learning rate

From MaRDI portal
Publication:1983178

DOI10.1007/S10915-021-01628-3zbMATH Open1477.90105arXiv1903.08688OpenAlexW3196800830MaRDI QIDQ1983178FDOQ1983178

Mariana Prazeres, Adam Oberman

Publication date: 15 September 2021

Published in: Journal of Scientific Computing (Search for Journal in Brave)

Abstract: Stochastic gradient descent (SGD) for strongly convex functions converges at the rate . However, achieving good results in practice requires tuning the parameters (for example the learning rate) of the algorithm. In this paper we propose a generalization of the Polyak step size, used for subgradient methods, to Stochastic gradient descent. We prove a non-asymptotic convergence at the rate with a rate constant which can be better than the corresponding rate constant for optimally scheduled SGD. We demonstrate that the method is effective in practice, and on convex optimization problems and on training deep neural networks, and compare to the theoretical rate.


Full work available at URL: https://arxiv.org/abs/1903.08688





Cites Work


Cited In (8)

Uses Software






This page was built for publication: Stochastic gradient descent with Polyak's learning rate

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1983178)