Statistical inference for model parameters in stochastic gradient descent (Q2176618)

From MaRDI portal

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	Statistical inference for model parameters in stochastic gradient descent	scientific article

Statements

scholarly article

0 references

Statistical inference for model parameters in stochastic gradient descent (English)

0 references

0 references

0 references

Xin Thomson Tong

0 references

0 references

The Annals of Statistics

0 references

publication date

5 May 2020

0 references

full work available at URL

https://arxiv.org/abs/1610.08637

0 references

https://projecteuclid.org/euclid.aos/1581930134

0 references

Let \(x^*\in\mathbb{R}^d\) be the true parameter of a statistical model. In common models, \(x^*\) is the minimizer of a convex objective function \(F\), i.e. \[ x^*=\operatorname{argmin}F(x),\quad F(x):=\mathbb{E} f(x,\zeta), \] where \(\zeta\) is the random sample from a probability distribution \(\Pi\) and \( f(x,\zeta)\) is the loss function. A popular optimization method for minimizing \(F\) is the stochastic gradient descent (SGD). Let \(x_0\) denote any given starting point. SGD is an iterative algorithm, where the \(i\)th iterate takes the form \[ x_i=x_{i-1}-\eta_i\nabla f(x_{i-1}, \zeta_i). \] The step size \(\eta_i\) is a decreasing nonrandom sequence, \(\zeta_i\) is the \(i\)th sample randomly drawn from \(\Pi\), and \(\nabla f(x, \zeta_i)\) denotes the gradient of \( f(x, \zeta_i)\) w.r.t. \(x.\) The algorithm outputs either the last iterate \(x_n,\) or the average iterate \(\bar{x}_n=n^{-1}\sum_{i=1}^n x_i.\) In the paper under review, \(f(\cdot, \zeta)\) is strictly convex and satisfies certain smoothness conditions. First, for fixed \(d,\) two consistent estimators of the asymptotic covariance of \(\bar{x}_n\) are proposed: (a) a plug-in estimator, and (b) a batch-means estimator, the latter only uses the iterates from SGD. Both estimators allow to construct asymptotic confidence intervals for \(x^*\) and hypothesis tests. Second, in high-dimensional sparse linear regression, where \(d\) can be much larger than the sample size \(n,\) the authors use a version of the SGD algorithm to minimize \(F(x)+\lambda \|x\|_1\), \(\lambda > 0,\) and construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficients and confidence intervals, which is applicable to online data.

0 references

zbMATH Keywords

stochastic gradient descent

0 references

asymptotic variance

0 references

batch-means estimator

0 references

high-dimensional inference

0 references

time-inhomogeneous Markov chain

0 references

Alexander G. Kukush

0 references

MaRDI profile type

MaRDI publication profile

0 references

Fast global convergence of gradient methods for high-dimensional statistical recovery

0 references

Least squares after model selection in high-dimensional sparse models

0 references

High-dimensional variable screening and bias in subsequent inference, with an empirical comparison

0 references

Statistics for high-dimensional data. Methods, theory and applications.

0 references

Statistical inference for model parameters in stochastic gradient descent

0 references

Strong Consistency and Other Properties of the Spectral Variance Estimator

0 references

On Asymptotic Normality in Stochastic Approximation

0 references

Sure Independence Screening for Ultrahigh Dimensional Feature Space

0 references

0 references

Batch means and spectral variance estimators in Markov chain Monte Carlo

0 references

Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework

0 references

Simulation Output Analysis Using Standardized Time Series

0 references

Estimating the asymptotic variance with batch means

0 references

Confidence Intervals and Hypothesis Testing for High-Dimensional Regression

0 references

Fixed-Width Output Analysis for Markov Chain Monte Carlo

0 references

High-dimensional graphs and variable selection with the Lasso

0 references

<i>p</i>-Values for High-Dimensional Regression

0 references

Robust Stochastic Approximation Approach to Stochastic Programming

0 references

Confidence level solutions for stochastic programming

0 references

A general theory of hypothesis tests and confidence regions for sparse high dimensional models

0 references

Acceleration of Stochastic Approximation by Averaging

0 references

A Stochastic Approximation Method

0 references

Introduction to Uncertainty Quantification

0 references

Asymptotic and finite-sample properties of estimators based on stochastic gradients

0 references

On asymptotically optimal confidence regions and tests for high-dimensional models

0 references

Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$-Constrained Quadratic Programming (Lasso)

0 references

0 references

A Proximal Stochastic Gradient Method with Progressive Variance Reduction

0 references

Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models

0 references

Identifiers

zbMATH Open document ID

0 references

10.1214/18-AOS1801

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2176618

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q2176618&oldid=37144585"