Stochastic gradient descent with Polyak's learning rate
From MaRDI portal
Publication:1983178
DOI10.1007/S10915-021-01628-3zbMATH Open1477.90105arXiv1903.08688OpenAlexW3196800830MaRDI QIDQ1983178FDOQ1983178
Mariana Prazeres, Adam Oberman
Publication date: 15 September 2021
Published in: Journal of Scientific Computing (Search for Journal in Brave)
Abstract: Stochastic gradient descent (SGD) for strongly convex functions converges at the rate . However, achieving good results in practice requires tuning the parameters (for example the learning rate) of the algorithm. In this paper we propose a generalization of the Polyak step size, used for subgradient methods, to Stochastic gradient descent. We prove a non-asymptotic convergence at the rate with a rate constant which can be better than the corresponding rate constant for optimally scheduled SGD. We demonstrate that the method is effective in practice, and on convex optimization problems and on training deep neural networks, and compare to the theoretical rate.
Full work available at URL: https://arxiv.org/abs/1903.08688
Cites Work
- Title not available (Why is that?)
- Introductory lectures on convex optimization. A basic course.
- First-Order Methods in Optimization
- A Stochastic Approximation Method
- Some methods of speeding up the convergence of iteration methods
- Variable target value subgradient method
- Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization
- Information-Based Complexity, Feedback and Dynamics in Convex Programming
- Optimization Methods for Large-Scale Machine Learning
- Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo
- Introduction to Continuous Optimization
Cited In (8)
- An adaptive Polyak heavy-ball method
- Greedy randomized sampling nonlinear Kaczmarz methods
- A novel stepsize for gradient descent method
- The accelerated tensor Kaczmarz algorithm with adaptive parameters for solving tensor systems
- Learning rate adaptation in stochastic gradient descent.
- Adaptive moment estimation for universal portfolio selection strategy
- Polyak minorant method for convex optimization
- Title not available (Why is that?)
Uses Software
This page was built for publication: Stochastic gradient descent with Polyak's learning rate
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1983178)