Stochastic gradient descent with Polyak's learning rate

DOI10.1007/S10915-021-01628-3zbMATH Open1477.90105arXiv1903.08688OpenAlexW3196800830MaRDI QIDQ1983178FDOQ1983178

Publication date: 15 September 2021

Published in: Journal of Scientific Computing (Search for Journal in Brave)

Abstract: Stochastic gradient descent (SGD) for strongly convex functions converges at the rate

. However, achieving good results in practice requires tuning the parameters (for example the learning rate) of the algorithm. In this paper we propose a generalization of the Polyak step size, used for subgradient methods, to Stochastic gradient descent. We prove a non-asymptotic convergence at the rate

with a rate constant which can be better than the corresponding rate constant for optimally scheduled SGD. We demonstrate that the method is effective in practice, and on convex optimization problems and on training deep neural networks, and compare to the theoretical rate.

Full work available at URL: https://arxiv.org/abs/1903.08688

Recommendations

zbMATH Keywords

optimization strong convexity stochastic gradient descent learning rate Polyak's learning rate

Mathematics Subject Classification ID

Nonlinear programming (90C30)

Cites Work

Cited In (32)

Uses Software

This page was built for publication: Stochastic gradient descent with Polyak's learning rate

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1983178)