Statistical inference for model parameters in stochastic gradient descent (Q2176618)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Statistical inference for model parameters in stochastic gradient descent |
scientific article |
Statements
Statistical inference for model parameters in stochastic gradient descent (English)
0 references
5 May 2020
0 references
Let \(x^*\in\mathbb{R}^d\) be the true parameter of a statistical model. In common models, \(x^*\) is the minimizer of a convex objective function \(F\), i.e. \[ x^*=\operatorname{argmin}F(x),\quad F(x):=\mathbb{E} f(x,\zeta), \] where \(\zeta\) is the random sample from a probability distribution \(\Pi\) and \( f(x,\zeta)\) is the loss function. A popular optimization method for minimizing \(F\) is the stochastic gradient descent (SGD). Let \(x_0\) denote any given starting point. SGD is an iterative algorithm, where the \(i\)th iterate takes the form \[ x_i=x_{i-1}-\eta_i\nabla f(x_{i-1}, \zeta_i). \] The step size \(\eta_i\) is a decreasing nonrandom sequence, \(\zeta_i\) is the \(i\)th sample randomly drawn from \(\Pi\), and \(\nabla f(x, \zeta_i)\) denotes the gradient of \( f(x, \zeta_i)\) w.r.t. \(x.\) The algorithm outputs either the last iterate \(x_n,\) or the average iterate \(\bar{x}_n=n^{-1}\sum_{i=1}^n x_i.\) In the paper under review, \(f(\cdot, \zeta)\) is strictly convex and satisfies certain smoothness conditions. First, for fixed \(d,\) two consistent estimators of the asymptotic covariance of \(\bar{x}_n\) are proposed: (a) a plug-in estimator, and (b) a batch-means estimator, the latter only uses the iterates from SGD. Both estimators allow to construct asymptotic confidence intervals for \(x^*\) and hypothesis tests. Second, in high-dimensional sparse linear regression, where \(d\) can be much larger than the sample size \(n,\) the authors use a version of the SGD algorithm to minimize \(F(x)+\lambda \|x\|_1\), \(\lambda > 0,\) and construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficients and confidence intervals, which is applicable to online data.
0 references
stochastic gradient descent
0 references
asymptotic variance
0 references
batch-means estimator
0 references
high-dimensional inference
0 references
time-inhomogeneous Markov chain
0 references
0 references
0 references