Taming Neural Networks with TUSLA: Nonconvex Learning via Adaptive Stochastic Gradient Langevin Algorithms
From MaRDI portal
Publication:6162009
Abstract: Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. In many cases, the gradient of any such loss function has superlinear growth, making the use of the widely-accepted (stochastic) gradient descent methods, which are based on Euler numerical schemes, problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithm's convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm are based on the taming technology for diffusion processes with superlinear coefficients as developed in citet{tamed-euler, SabanisAoAP} and for MCMC algorithms in citet{tula}. Numerical experiments are presented which confirm the theoretical findings and illustrate the need for the use of the new algorithm in comparison to vanilla SGLD within the framework of ANNs.
Recommendations
- Conjugate-gradient-based Adam for nonconvex stochastic optimization and its application to deep learning
- Stochastic generalized gradient methods for training nonconvex nonsmooth neural networks
- Adaptive methods using element-wise \(p\)th power of stochastic gradient for nonconvex optimization in deep neural networks
- Stochastic gradient Langevin dynamics with adaptive drifts
- Stochastic perturbation of subgradient algorithm for nonconvex deep neural networks
- Uniformly convex neural networks and non-stationary iterated network Tikhonov (iNETT) method
- Adaptivity of stochastic gradient methods for nonconvex optimization
- The convergence of stochastic gradient algorithms applied to learning in neural networks
- Optimal nonparametric inference via deep neural network
Cites work
- A note on tamed Euler approximations
- Convergence and dynamical behavior of the ADAM algorithm for nonconvex stochastic optimization
- Couplings and quantitative contraction rates for Langevin dynamics
- Euler approximations with varying coefficients: the case of superlinearly growing diffusion coefficients
- High-dimensional Bayesian inference via the unadjusted Langevin algorithm
- Higher order Langevin Monte Carlo algorithm
- Laplace's method revisited: Weak convergence of probability measures
- Nonasymptotic convergence analysis for the unadjusted Langevin algorithm
- Nonasymptotic estimates for stochastic gradient Langevin dynamics under local conditions in nonconvex optimization
- On Stochastic Gradient Langevin Dynamics with Dependent Data Streams: The Fully Nonconvex Case
- On stochastic gradient Langevin dynamics with dependent data streams in the logconcave case
- Quantitative Harris-type theorems for diffusions and McKean-Vlasov processes
- Strong and weak divergence in finite time of Euler's method for stochastic differential equations with non-globally Lipschitz continuous coefficients
- Strong convergence of an explicit numerical method for SDEs with nonglobally Lipschitz continuous coefficients
- The tamed unadjusted Langevin algorithm
- Theoretical Guarantees for Approximate Sampling from Smooth and Log-Concave Densities
- User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient
Cited in
(6)- An inertial Newton algorithm for deep learning
- A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks
- Statistical Finite Elements via Langevin Dynamics
- Kinetic Langevin MCMC sampling without gradient Lipschitz continuity -- the strongly convex case
- Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function
- Non-asymptotic convergence bounds for modified tamed unadjusted Langevin algorithm in non-convex setting
This page was built for publication: Taming Neural Networks with TUSLA: Nonconvex Learning via Adaptive Stochastic Gradient Langevin Algorithms
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6162009)