Stochastic gradient descent with Polyak's learning rate
From MaRDI portal
Publication:1983178
DOI10.1007/S10915-021-01628-3zbMATH Open1477.90105arXiv1903.08688OpenAlexW3196800830MaRDI QIDQ1983178FDOQ1983178
Authors: Mariana Prazeres, Adam Oberman
Publication date: 15 September 2021
Published in: Journal of Scientific Computing (Search for Journal in Brave)
Abstract: Stochastic gradient descent (SGD) for strongly convex functions converges at the rate . However, achieving good results in practice requires tuning the parameters (for example the learning rate) of the algorithm. In this paper we propose a generalization of the Polyak step size, used for subgradient methods, to Stochastic gradient descent. We prove a non-asymptotic convergence at the rate with a rate constant which can be better than the corresponding rate constant for optimally scheduled SGD. We demonstrate that the method is effective in practice, and on convex optimization problems and on training deep neural networks, and compare to the theoretical rate.
Full work available at URL: https://arxiv.org/abs/1903.08688
Recommendations
- Lower error bounds for the stochastic gradient descent optimization algorithm: sharp convergence rates for slowly and fast decaying learning rates
- Convergence rates for the stochastic gradient descent method for non-convex objective functions
- Minimizing finite sums with the stochastic average gradient
- New Convergence Aspects of Stochastic Gradient Algorithms
- Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression
Cites Work
- Adaptive subgradient methods for online learning and stochastic optimization
- Introductory lectures on convex optimization. A basic course.
- First-order methods in optimization
- A Stochastic Approximation Method
- Some methods of speeding up the convergence of iteration methods
- Variable target value subgradient method
- Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization
- Information-Based Complexity, Feedback and Dynamics in Convex Programming
- Optimization methods for large-scale machine learning
- Laplacian smoothing stochastic gradient Markov chain Monte Carlo
- Introduction to continuous optimization
Cited In (32)
- An adaptive Polyak heavy-ball method
- Greedy randomized sampling nonlinear Kaczmarz methods
- A novel stepsize for gradient descent method
- Lower error bounds for the stochastic gradient descent optimization algorithm: sharp convergence rates for slowly and fast decaying learning rates
- A fast non-monotone line search for stochastic gradient descent
- The accelerated tensor Kaczmarz algorithm with adaptive parameters for solving tensor systems
- A stochastic gradient method with variance control and variable learning rate for deep learning
- Non-ergodic linear convergence property of the delayed gradient descent under the strongly convexity and the Polyak-Łojasiewicz condition
- Learning rate adaptation in stochastic gradient descent.
- Adaptive moment estimation for universal portfolio selection strategy
- Stochastic algorithms with geometric step decay converge linearly on sharp functions
- Convergence of stochastic gradient descent in deep neural network
- Optimal rates for multi-pass stochastic gradient methods
- Convergence rates for the stochastic gradient descent method for non-convex objective functions
- An adaptive gradient method with energy and momentum
- On the linear convergence of the stochastic gradient method with constant step-size
- Stability and optimization error of stochastic gradient descent for pairwise learning
- Stochastic gradient descent: where optimization meets machine learning
- Optimized convergence of stochastic gradient descent by weighted averaging
- Why random reshuffling beats stochastic gradient descent
- Bolstering stochastic gradient descent with model building
- Making the last iterate of SGD information theoretically optimal
- Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers
- Bridging the gap between constant step size stochastic gradient descent and Markov chains
- Polyak minorant method for convex optimization
- Title not available (Why is that?)
- The stochastic delta rule: faster and more accurate deep learning through adaptive weight noise
- Scheduled restart momentum for accelerated stochastic gradient descent
- Accelerating deep neural network training with inconsistent stochastic gradient descent
- Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression
- Stopping criteria for, and strong convergence of, stochastic gradient descent on Bottou-Curtis-Nocedal functions
- Laplacian smoothing gradient descent
Uses Software
This page was built for publication: Stochastic gradient descent with Polyak's learning rate
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1983178)