Natural Langevin dynamics for neural networks
From MaRDI portal
Publication:1689178
DOI10.1007/978-3-319-68445-1_53zbMATH Open1428.82047arXiv1712.01076OpenAlexW2962890870MaRDI QIDQ1689178FDOQ1689178
Authors: Gaétan Marceau-Caron, Y. Ollivier
Publication date: 12 January 2018
Abstract: One way to avoid overfitting in machine learning is to use model parameters distributed according to a Bayesian posterior given the data, rather than the maximum likelihood estimator. Stochastic gradient Langevin dynamics (SGLD) is one algorithm to approximate such Bayesian posteriors for large models and datasets. SGLD is a standard stochastic gradient descent to which is added a controlled amount of noise, specifically scaled so that the parameter converges in law to the posterior distribution [WT11, TTV16]. The posterior predictive distribution can be approximated by an ensemble of samples from the trajectory. Choice of the variance of the noise is known to impact the practical behavior of SGLD: for instance, noise should be smaller for sensitive parameter directions. Theoretically, it has been suggested to use the inverse Fisher information matrix of the model as the variance of the noise, since it is also the variance of the Bayesian posterior [PT13, AKW12, GC11]. But the Fisher matrix is costly to compute for large- dimensional models. Here we use the easily computed Fisher matrix approximations for deep neural networks from [MO16, Oll15]. The resulting natural Langevin dynamics combines the advantages of Amari's natural gradient descent and Fisher-preconditioned Langevin dynamics for large neural networks. Small-scale experiments on MNIST show that Fisher matrix preconditioning brings SGLD close to dropout as a regularizing technique.
Full work available at URL: https://arxiv.org/abs/1712.01076
Recommendations
- scientific article; zbMATH DE number 6860839
- Laplacian smoothing stochastic gradient Markov chain Monte Carlo
- Exploration of the (non-)asymptotic bias and variance of stochastic gradient Langevin dynamics
- Hybrid deterministic-stochastic gradient Langevin dynamics for Bayesian learning
- Consistency and fluctuations for stochastic gradient Langevin dynamics
Stochastic methods (Fokker-Planck, Langevin, etc.) applied to problems in time-dependent statistical mechanics (82C31) Neural nets applied to problems in time-dependent statistical mechanics (82C32)
Cited In (5)
- Learning and Inference in Sparse Coding Models With Langevin Dynamics
- Mean-field Langevin dynamics and energy landscape of neural networks
- Optimization methods for large-scale machine learning
- Jean-Louis Koszul and the elementary structures of information geometry
- On Langevin Updating in Multilayer Perceptrons
This page was built for publication: Natural Langevin dynamics for neural networks
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1689178)