On the diffusion approximation of nonconvex stochastic gradient descent
From MaRDI portal
Abstract: We study the Stochastic Gradient Descent (SGD) method in nonconvex optimization problems from the point of view of approximating diffusion processes. We prove rigorously that the diffusion process can approximate the SGD algorithm weakly using the weak form of master equation for probability evolution. In the small step size regime and the presence of omnidirectional noise, our weak approximating diffusion process suggests the following dynamics for the SGD iteration starting from a local minimizer (resp.~saddle point): it escapes in a number of iterations exponentially (resp.~almost linearly) dependent on the inverse stepsize. The results are obtained using the theory for random perturbations of dynamical systems (theory of large deviations for local minimizers and theory of exiting for unstable stationary points). In addition, we discuss the effects of batch size for the deep neural networks, and we find that small batch size is helpful for SGD algorithms to escape unstable stationary points and sharp minimizers. Our theory indicates that one should increase the batch size at later stage for the SGD to be trapped in flat minimizers for better generalization.
Recommendations
- A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization
- Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation
- On large batch training and sharp minima: a Fokker-Planck perspective
- Convergence of constant step stochastic gradient descent for non-smooth non-convex functions
- Analysis of stochastic gradient descent in continuous time
Cited in
(22)- Numerical analysis for inchworm Monte Carlo method: Sign problem and error growth
- Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation
- A convergence analysis of the perturbed compositional gradient flow: averaging principle and normal deviations
- Momentum-based accelerated mirror descent stochastic approximation for robust topology optimization under stochastic loads
- On large batch training and sharp minima: a Fokker-Planck perspective
- Stochastic modified flows for Riemannian stochastic gradient descent
- A Continuous-Time Analysis of Distributed Stochastic Gradient
- Random batch methods (RBM) for interacting particle systems
- Stochastic differential equation approximations of generative adversarial network training and its long-run behavior
- A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization
- On the fast convergence of random perturbations of the gradient flow
- Stochastic modified equations and dynamics of stochastic gradient algorithms. I: Mathematical foundations
- A probability approximation framework: Markov process approach
- The effective noise of stochastic gradient descent
- Applications of Fokker Planck equations in machine learning algorithms
- A random batch method for efficient ensemble forecasts of multiscale turbulent systems
- Second-Order Guarantees of Stochastic Gradient Descent in Nonconvex Optimization
- On uniform-in-time diffusion approximation for stochastic gradient descent
- Approximation to stochastic variance reduced gradient Langevin dynamics by stochastic delay differential equations
- Analysis of stochastic gradient descent in continuous time
- A Random-Batch Monte Carlo Method for Many-Body Systems with Singular Kernels
- One-dimensional system arising in stochastic gradient descent
This page was built for publication: On the diffusion approximation of nonconvex stochastic gradient descent
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1734292)