Deep relaxation: partial differential equations for optimizing deep neural networks (Q2319762)

From MaRDI portal

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	Deep relaxation: partial differential equations for optimizing deep neural networks	scientific article

Statements

scholarly article

0 references

Deep relaxation: partial differential equations for optimizing deep neural networks (English)

0 references

Pratik Chaudhari

0 references

Adam M. Oberman

0 references

0 references

Guillaume Carlier

0 references

Stanley J. Osher

0 references

Research in the Mathematical Sciences

0 references

publication date

20 August 2019

0 references

full work available at URL

https://arxiv.org/abs/1704.04932

0 references

The output of a deep neural network is defined as \(y(x;\xi)=\sigma(x^p\sigma(x^{p-1}\dots\sigma(x^1\xi)\dots)\) which is a nested composition of linear functions depending on inputs \(\xi\in \mathbb{R}^d\) and weights \(x\in \mathbb{R}^n\), which are the parameters of the network. Performing a supervised training the goal is to minimize a certain loss function \(f(x)\). In the background part, there is a short discussion on stochastic gradient descent (SGD) and stochastic gradient descent in continuous time methods, as well as a presentation of some references. The third section concentrates on few results on the PDE-interpretation of local entropy, as a derivation of the viscous Hamilton-Jacobi PDE or the Hopf-Lax formula for the Hamilton-Jacobi equation. The fourth section deals with the derivation of local entropy via homogenization of SDEs. Further, one shows results on stochastic control for a variant of local entropy. Since it has been proved, in a previous section, that the regularized loss function is the solution of a viscous Hamilton-Jacobi equation, one can apply semi-concavity estimates from PDE theory and quantify the amount of smoothing. Some examples to illustrate the widening of local minima are presented. The last two sections are devoted to a comparison of various algorithms presented in this paper. The aim is to show that the considered collection of PDE methods improve results on modern datasets as MNIST or CIFAR datasets. All along the article, one uses intensively the published results of the authors, compares results, investigates and improves some of the details.

0 references

Claudia Simionescu-Badea

0 references

zbMATH Keywords

deep learning

0 references

partial differential equations

0 references

stochastic gradient descent

0 references

neural networks

0 references

optimal control

0 references

describes a project that uses

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

MaRDI profile type

MaRDI publication profile

0 references

0 references

Local entropy as a measure for sampling solutions in constraint satisfaction problems

0 references

0 references

Optimization Methods for Large-Scale Machine Learning

0 references

Semiconcave functions, Hamilton-Jacobi equations, and optimal control

0 references

Contractions in the 2-Wasserstein length space and thermalization of granular media

0 references

Entropy-SGD: biasing gradient descent into wide valleys

0 references

Smoothing methods for nonsmooth, nonconvex minimization

0 references

0 references

0 references

0 references

0 references

Controlled Markov processes and viscosity solutions

0 references

Replica symmetry breaking condition exposed by random matrix calculation of landscape complexity

0 references

Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming

0 references

Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle

0 references

The Variational Formulation of the Fokker--Planck Equation

0 references

Mean field games

0 references

Inequalities: theory of majorization and its applications

0 references

0 references

Proximité et dualité dans un espace hilbertien

0 references

0 references

Convergent Difference Schemes for Degenerate Elliptic and Parabolic Equations: Hamilton--Jacobi Equations and Free Boundary Problems

0 references

Stochastic Processes and Applications

0 references

Multiscale Methods

0 references

The Fokker-Planck equation. Methods of solution and applications

0 references

A Stochastic Approximation Method

0 references

Monotone Operators and the Proximal Point Algorithm

0 references

Learning representations by back-propagating errors

0 references

Optimal transport for applied mathematicians. Calculus of variations, PDEs, and modeling

0 references

Minimizing finite sums with the stochastic average gradient

0 references

0 references

0 references

Identifiers

zbMATH Open document ID

0 references

10.1007/s40687-018-0148-y

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2319762

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q2319762&oldid=37850531"