Stochastic gradient descent with Polyak's learning rate

From MaRDI portal
Publication:1983178




Abstract: Stochastic gradient descent (SGD) for strongly convex functions converges at the rate . However, achieving good results in practice requires tuning the parameters (for example the learning rate) of the algorithm. In this paper we propose a generalization of the Polyak step size, used for subgradient methods, to Stochastic gradient descent. We prove a non-asymptotic convergence at the rate with a rate constant which can be better than the corresponding rate constant for optimally scheduled SGD. We demonstrate that the method is effective in practice, and on convex optimization problems and on training deep neural networks, and compare to the theoretical rate.




Cited in
(32)


Describes a project that uses

Uses Software





This page was built for publication: Stochastic gradient descent with Polyak's learning rate

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1983178)