Stochastic gradient descent with noise of machine learning type. II: Continuous time analysis
From MaRDI portal
Publication:6188971
Abstract: The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion. Suitable parameters of these are found by minimizing a 'loss functional', typically by stochastic gradient descent (SGD) or an advanced SGD-based algorithm. In a continuous time model for SGD with noise that follows the 'machine learning scaling', we show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.
Recommendations
- Stochastic gradient descent with noise of machine learning type. I: Discrete time analysis
- Analysis of stochastic gradient descent in continuous time
- The effective noise of stochastic gradient descent
- The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima
- A mean field view of the landscape of two-layer neural networks
Cites work
- scientific article; zbMATH DE number 5681750 (Why is no real title available?)
- A Liouville Theorem for Degenerate Elliptic Equations
- A Stochastic Approximation Method
- A comprehensive introduction to sub-Riemannian geometry. From the Hamiltonian viewpoint. With an appendix by Igor Zelenko
- A mean field view of the landscape of two-layer neural networks
- About the Hardy inequality
- Analysis of a two-layer neural network via displacement convexity
- Analysis of stochastic gradient descent in continuous time
- Bounds for the Discrete Part of the Spectrum of a Semi-Bounded Schrödinger Operator.
- Elliptic partial differential equations of second order
- Functional analysis, Sobolev spaces and partial differential equations
- Improved Poincaré inequalities
- Mean field analysis of neural networks: a law of large numbers
- Mean-field Langevin dynamics and energy landscape of neural networks
- Measure theory and fine properties of functions
- On the heat diffusion for generic Riemannian and sub-Riemannian structures
- Optimal control of stochastic differential equations via Fokker-Planck equations
- Rectifiable sets, densities and tangent measures
- Regularity theory for general stable operators
- Regularity theory for general stable operators: parabolic equations
- Sharp rates of decay of solutions to the nonlinear fast diffusion equation via functional inequalities
- Stochastic gradient descent in continuous time
- Stochastic gradient descent in continuous time: a central limit theorem
- Stochastic gradient descent with noise of machine learning type. I: Discrete time analysis
- Sub-Laplacian eigenvalue bounds on sub-Riemannian manifolds
- The Variational Formulation of the Fokker--Planck Equation
- Wahrscheinlichkeitstheorie
This page was built for publication: Stochastic gradient descent with noise of machine learning type. II: Continuous time analysis
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6188971)