Stochastic gradient descent: where optimization meets machine learning (Q6200207)

From MaRDI portal
scientific article; zbMATH DE number 7822588
Language Label Description Also known as
English
Stochastic gradient descent: where optimization meets machine learning
scientific article; zbMATH DE number 7822588

    Statements

    Stochastic gradient descent: where optimization meets machine learning (English)
    0 references
    0 references
    22 March 2024
    0 references
    Summary: Stochastic gradient descent (SGD) is the de facto optimization algorithm for training neural networks in modern machine learning, thanks to its unique scalability to problem sizes where the data points, the number of data points, and the number of free parameters to optimize are on the scale of billions. On the one hand, many of the mathematical foundations for stochastic gradient descent were developed decades before the advent of modern deep learning, from stochastic approximation to the randomized Kaczmarz algorithm for solving linear systems. On the other hand, the omnipresence of stochastic gradient descent in modern machine learning and the resulting importance of optimizing performance of SGD in practical settings have motivated new algorithmic designs and mathematical breakthroughs. In this note, we recall some history and state-of-the-art convergence theory for SGD which is most useful in modern applications where it is used. We discuss recent breakthroughs in adaptive gradient variants of stochastic gradient descent, which go a long way towards addressing one of the weakest points of SGD: its sensitivity and reliance on hyperparameters, most notably, the choice of step-sizes. For the entire collection see [Zbl 07816361].
    0 references
    adaptive gradient method
    0 references
    smoothness
    0 references
    stochastic approximation
    0 references
    convergence
    0 references
    step-size choice
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references