On stochastic roundoff errors in gradient descent with low-precision computation
From MaRDI portal
Publication:6150643
Abstract: When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.
Cites work
- Accuracy and Stability of Numerical Algorithms
- An introduction to the theory of functional equations and inequalities. Cauchy's equation and Jensen's inequality. Edited by Attila Gilányi
- Applied logistic regression
- Effects of round-to-nearest and stochastic rounding in the numerical solution of the heat equation in low precision
- Gradient Convergence in Gradient methods with Errors
- Gradient descent optimizes over-parameterized deep ReLU networks
- Multinomial logistic regression algorithm
- Probability and conditional expectation. Fundamentals for the empirical sciences
- Properties of the sign gradient descent algorithms
- Simulating Low Precision Floating-Point Arithmetic
- Stochastic rounding and its probabilistic backward error analysis
- Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations
This page was built for publication: On stochastic roundoff errors in gradient descent with low-precision computation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6150643)