Statistical inference for model parameters in stochastic gradient descent (Q2176618)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Statistical inference for model parameters in stochastic gradient descent
scientific article

    Statements

    Statistical inference for model parameters in stochastic gradient descent (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    5 May 2020
    0 references
    Let \(x^*\in\mathbb{R}^d\) be the true parameter of a statistical model. In common models, \(x^*\) is the minimizer of a convex objective function \(F\), i.e. \[ x^*=\operatorname{argmin}F(x),\quad F(x):=\mathbb{E} f(x,\zeta), \] where \(\zeta\) is the random sample from a probability distribution \(\Pi\) and \( f(x,\zeta)\) is the loss function. A popular optimization method for minimizing \(F\) is the stochastic gradient descent (SGD). Let \(x_0\) denote any given starting point. SGD is an iterative algorithm, where the \(i\)th iterate takes the form \[ x_i=x_{i-1}-\eta_i\nabla f(x_{i-1}, \zeta_i). \] The step size \(\eta_i\) is a decreasing nonrandom sequence, \(\zeta_i\) is the \(i\)th sample randomly drawn from \(\Pi\), and \(\nabla f(x, \zeta_i)\) denotes the gradient of \( f(x, \zeta_i)\) w.r.t. \(x.\) The algorithm outputs either the last iterate \(x_n,\) or the average iterate \(\bar{x}_n=n^{-1}\sum_{i=1}^n x_i.\) In the paper under review, \(f(\cdot, \zeta)\) is strictly convex and satisfies certain smoothness conditions. First, for fixed \(d,\) two consistent estimators of the asymptotic covariance of \(\bar{x}_n\) are proposed: (a) a plug-in estimator, and (b) a batch-means estimator, the latter only uses the iterates from SGD. Both estimators allow to construct asymptotic confidence intervals for \(x^*\) and hypothesis tests. Second, in high-dimensional sparse linear regression, where \(d\) can be much larger than the sample size \(n,\) the authors use a version of the SGD algorithm to minimize \(F(x)+\lambda \|x\|_1\), \(\lambda > 0,\) and construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficients and confidence intervals, which is applicable to online data.
    0 references
    stochastic gradient descent
    0 references
    asymptotic variance
    0 references
    batch-means estimator
    0 references
    high-dimensional inference
    0 references
    time-inhomogeneous Markov chain
    0 references
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references