Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression
From MaRDI portal
Publication:4637017
Abstract: We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We present the first algorithm that achieves jointly the optimal prediction error rates for least-squares regression, both in terms of forgetting of initial conditions in O(1/n 2), and in terms of dependence on the noise and dimension d of the problem, as O(d/n). Our new algorithm is based on averaged accelerated regularized gradient descent, and may also be analyzed through finer assumptions on initial conditions and the Hessian matrix, leading to dimension-free quantities that may still be small while the " optimal " terms above are large. In order to characterize the tightness of these new bounds, we consider an application to non-parametric regression and use the known lower bounds on the statistical performance (without computational limits), which happen to match our bounds obtained from a single pass on the data and thus show optimality of our algorithm in a wide variety of particular trade-offs between bias and variance.
Recommendations
- Convergence rates of least squares regression estimators with heavy-tailed errors
- scientific article; zbMATH DE number 3992650
- The convergence rate of learning algorithms for least square regression with sample dependent hypothesis spaces
- Optimal strong convergence rates in nonparametric regression
- scientific article; zbMATH DE number 4109874
- On convergence rates of convex regression in multiple dimensions
- Strong convergence rate of the least median absolute estimator in linear regression models
- On the convergence of pseudo-linear regression algorithms
- scientific article; zbMATH DE number 3856232
Cites work
- scientific article; zbMATH DE number 3850830 (Why is no real title available?)
- scientific article; zbMATH DE number 4164577 (Why is no real title available?)
- scientific article; zbMATH DE number 47310 (Why is no real title available?)
- scientific article; zbMATH DE number 976356 (Why is no real title available?)
- scientific article; zbMATH DE number 1972910 (Why is no real title available?)
- scientific article; zbMATH DE number 936298 (Why is no real title available?)
- A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
- A Stochastic Approximation Method
- Acceleration of Stochastic Approximation by Averaging
- Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression
- An alternative point of view on Lepski's method
- An optimal method for stochastic composite optimization
- Best choices for regularization parameters in learning theory: on the bias-variance problem.
- Concentration inequalities and model selection. Ecole d'Eté de Probabilités de Saint-Flour XXXIII -- 2003.
- Dual averaging methods for regularized stochastic learning and online optimization
- First-order methods of smooth convex optimization with inexact oracle
- Introduction to nonparametric estimation
- Introductory lectures on convex optimization. A basic course.
- Model selection for regularized least-squares algorithm in learning theory
- Nonparametric stochastic approximation with large step-sizes
- On early stopping in gradient descent learning
- On the Averaged Stochastic Approximation for Linear Regression
- Online Learning as Stochastic Approximation of Regularization Paths: Optimality and Almost-Sure Convergence
- Online gradient descent learning algorithms
- Optimal distributed online prediction using mini-batches
- Optimal rates for multi-pass stochastic gradient methods
- Optimal rates for the regularized least-squares algorithm
- Performance of empirical risk minimization in linear aggregation
- Random design analysis of ridge regression
- Robust Stochastic Approximation Approach to Stochastic Programming
- Smooth Optimization with Approximate Gradient
- Some methods of speeding up the convergence of iteration methods
- Support Vector Machines
- The lower tail of random quadratic forms with applications to ordinary least squares
- Theory of Reproducing Kernels
Cited in
(18)- Some limit properties of Markov chains induced by recursive stochastic algorithms
- Adaptivity of stochastic gradient methods for nonconvex optimization
- On stochastic accelerated gradient with convergence rate of regression learning
- On the rates of convergence of parallelized averaged stochastic gradient algorithms
- scientific article; zbMATH DE number 7307490 (Why is no real title available?)
- On variance reduction for stochastic smooth convex optimization with multiplicative noise
- Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling
- Generalization properties of doubly stochastic learning algorithms
- Dual space preconditioning for gradient descent
- Finite impulse response models: a non-asymptotic analysis of the least squares estimator
- From inexact optimization to learning via gradient concentration
- scientific article; zbMATH DE number 7415083 (Why is no real title available?)
- On the convergence of pseudo-linear regression algorithms
- On the adaptivity of stochastic gradient-based optimization
- Bridging the gap between constant step size stochastic gradient descent and Markov chains
- Nonparametric stochastic approximation with large step-sizes
- Memory-sample tradeoffs for linear regression with small error
- Dimension independent excess risk by stochastic gradient descent
This page was built for publication: Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4637017)