Stochastic gradient descent with noise of machine learning type. II: Continuous time analysis

DOI10.1007/S00332-023-09992-0arXiv2106.02588OpenAlexW3166627127MaRDI QIDQ6188971FDOQ6188971

Publication date: 12 January 2024

Published in: Journal of Nonlinear Science (Search for Journal in Brave)

Abstract: The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion. Suitable parameters of these are found by minimizing a 'loss functional', typically by stochastic gradient descent (SGD) or an advanced SGD-based algorithm. In a continuous time model for SGD with noise that follows the 'machine learning scaling', we show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.

Full work available at URL: https://arxiv.org/abs/2106.02588

Recommendations

zbMATH Keywords

deep learning machine learning nonconvex optimization stochastic differential equation stochastic gradient descent overparametrization invariant distribution degenerate diffusion equation implicit bias global minimum selection flat minimum selection Poincaré-Hardy inequality

Mathematics Subject Classification ID

Artificial neural networks and deep learning (68T07) Nonconvex programming, global optimization (90C26) Degenerate parabolic equations (35K65) Applications of stochastic analysis (to PDEs, etc.) (60H30)

Cites Work

Cited In (1)

Stochastic modified flows for Riemannian stochastic gradient descent

This page was built for publication: Stochastic gradient descent with noise of machine learning type. II: Continuous time analysis

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6188971)