Stochastic gradient descent: where optimization meets machine learning (Q6200207)
From MaRDI portal
scientific article; zbMATH DE number 7822588
Language | Label | Description | Also known as |
---|---|---|---|
English | Stochastic gradient descent: where optimization meets machine learning |
scientific article; zbMATH DE number 7822588 |
Statements
Stochastic gradient descent: where optimization meets machine learning (English)
0 references
22 March 2024
0 references
Summary: Stochastic gradient descent (SGD) is the de facto optimization algorithm for training neural networks in modern machine learning, thanks to its unique scalability to problem sizes where the data points, the number of data points, and the number of free parameters to optimize are on the scale of billions. On the one hand, many of the mathematical foundations for stochastic gradient descent were developed decades before the advent of modern deep learning, from stochastic approximation to the randomized Kaczmarz algorithm for solving linear systems. On the other hand, the omnipresence of stochastic gradient descent in modern machine learning and the resulting importance of optimizing performance of SGD in practical settings have motivated new algorithmic designs and mathematical breakthroughs. In this note, we recall some history and state-of-the-art convergence theory for SGD which is most useful in modern applications where it is used. We discuss recent breakthroughs in adaptive gradient variants of stochastic gradient descent, which go a long way towards addressing one of the weakest points of SGD: its sensitivity and reliance on hyperparameters, most notably, the choice of step-sizes. For the entire collection see [Zbl 07816361].
0 references
adaptive gradient method
0 references
smoothness
0 references
stochastic approximation
0 references
convergence
0 references
step-size choice
0 references
0 references