A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)
From MaRDI portal
Publication:5136291
Abstract: This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.
Recommendations
- Bridging the gap between constant step size stochastic gradient descent and Markov chains
- Finite-Time Analysis of Markov Gradient Descent
- Publication:3491316
- On optimal probabilities in stochastic coordinate descent methods
- The minimax stochastic optimization model of the Gauss-Markov nonlinear model's coefficients under the quadratic loss function
- The method of generalized stochastic gradient for solving minimax problems with constrained variables
- Optimal survey schemes for stochastic gradient descent with applications to \(M\)-estimation
- Minimizing finite sums with the stochastic average gradient
- scientific article; zbMATH DE number 125279
- Stochastic Minimization with Constant Step-Size: Asymptotic Laws
Cites work
- scientific article; zbMATH DE number 1220667 (Why is no real title available?)
- scientific article; zbMATH DE number 1827860 (Why is no real title available?)
- Acceleration of Stochastic Approximation by Averaging
- Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression
- Nonparametric stochastic approximation with large step-sizes
- Stochastic approximation methods for constrained and unconstrained systems
Cited in
(10)- Making the last iterate of SGD information theoretically optimal
- Bridging the gap between constant step size stochastic gradient descent and Markov chains
- Stability and optimization error of stochastic gradient descent for pairwise learning
- On the regularization effect of stochastic gradient descent applied to least-squares
- On the regularizing property of stochastic gradient descent
- A Markovian Incremental Stochastic Subgradient Algorithm
- Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification
- High-dimensional limit of one-pass SGD on least squares
- Optimal rates for multi-pass stochastic gradient methods
- Lower error bounds for the stochastic gradient descent optimization algorithm: sharp convergence rates for slowly and fast decaying learning rates
This page was built for publication: A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5136291)