Deep relaxation: partial differential equations for optimizing deep neural networks (Q2319762): Difference between revisions

The output of a deep neural network is defined as \(y(x;\xi)=\sigma(x^p\sigma(x^{p-1}\dots\sigma(x^1\xi)\dots)\) which is a nested composition of linear functions depending on inputs \(\xi\in \mathbb{R}^d\) and weights \(x\in \mathbb{R}^n\), which are the parameters of the network. Performing a supervised training the goal is to minimize a certain loss function \(f(x)\). In the background part, there is a short discussion on stochastic gradient descent (SGD) and stochastic gradient descent in continuous time methods, as well as a presentation of some references. The third section concentrates on few results on the PDE-interpretation of local entropy, as a derivation of the viscous Hamilton-Jacobi PDE or the Hopf-Lax formula for the Hamilton-Jacobi equation. The fourth section deals with the derivation of local entropy via homogenization of SDEs. Further, one shows results on stochastic control for a variant of local entropy. Since it has been proved, in a previous section, that the regularized loss function is the solution of a viscous Hamilton-Jacobi equation, one can apply semi-concavity estimates from PDE theory and quantify the amount of smoothing. Some examples to illustrate the widening of local minima are presented. The last two sections are devoted to a comparison of various algorithms presented in this paper. The aim is to show that the considered collection of PDE methods improve results on modern datasets as MNIST or CIFAR datasets. All along the article, one uses intensively the published results of the authors, compares results, investigates and improves some of the details.

0 references

reviewed by

Claudia Simionescu-Badea

0 references

zbMATH Keywords

deep learning

0 references

partial differential equations

0 references

stochastic gradient descent

0 references

neural networks

0 references

optimal control

0 references

describes a project that uses

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

MaRDI publication profile

0 references

cites work

Q5186515

0 references

Local entropy as a measure for sampling solutions in constraint satisfaction problems

0 references

Q4830373

0 references

Optimization Methods for Large-Scale Machine Learning

0 references

Semiconcave functions, Hamilton-Jacobi equations, and optimal control

0 references

Contractions in the 2-Wasserstein length space and thermalization of granular media

0 references

Entropy-SGD: biasing gradient descent into wide valleys

0 references

Smoothing methods for nonsmooth, nonconvex minimization

0 references

0 references

0 references

0 references

0 references

Controlled Markov processes and viscosity solutions

0 references

Replica symmetry breaking condition exposed by random matrix calculation of landscape complexity

0 references

Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming

0 references

Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle

0 references

The Variational Formulation of the Fokker--Planck Equation

0 references

Mean field games

0 references

Inequalities: theory of majorization and its applications

0 references

Q4237477

0 references

Proximité et dualité dans un espace hilbertien

0 references

Q3320132

0 references

Convergent Difference Schemes for Degenerate Elliptic and Parabolic Equations: Hamilton--Jacobi Equations and Free Boundary Problems

0 references

Stochastic Processes and Applications

0 references

Multiscale Methods

0 references

The Fokker-Planck equation. Methods of solution and applications

0 references

A Stochastic Approximation Method

0 references

Monotone Operators and the Proximal Point Algorithm

0 references

Learning representations by back-propagating errors

0 references

Optimal transport for applied mathematicians. Calculus of variations, PDEs, and modeling

0 references

Minimizing finite sums with the stochastic average gradient

0 references

Q2934059

0 references

Q3560913

0 references

Identifiers

zbMATH Open document ID

1427.82032

0 references

DOI

10.1007/s40687-018-0148-y

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2319762

@@ Property / describes a project that uses @@
+Saga
@@ Property / describes a project that uses: Saga / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+CIFAR
@@ Property / describes a project that uses: CIFAR / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+Adam
@@ Property / describes a project that uses: Adam / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+ImageNet
@@ Property / describes a project that uses: ImageNet / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+Entropy-SGD
@@ Property / describes a project that uses: Entropy-SGD / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / OpenAlex ID @@
+W2963480765
@@ Property / OpenAlex ID: W2963480765 / rank @@
+Normal rank
@@ Property / arXiv ID @@
+.04932
@@ Property / arXiv ID: 1704.04932 / rank @@
+Normal rank
@@ Property / cites work @@
+Q5186515
@@ Property / cites work: Q5186515 / rank @@
+Normal rank
@@ Property / cites work @@
+Local entropy as a measure for sampling solutions in constraint satisfaction problems
+Normal rank
@@ Property / cites work @@
+Q4830373
@@ Property / cites work: Q4830373 / rank @@
+Normal rank
@@ Property / cites work @@
+Optimization Methods for Large-Scale Machine Learning
+Normal rank
@@ Property / cites work @@
+Semiconcave functions, Hamilton-Jacobi equations, and optimal control
+Normal rank
@@ Property / cites work @@
+Contractions in the 2-Wasserstein length space and thermalization of granular media
+Normal rank
@@ Property / cites work @@
+Entropy-SGD: biasing gradient descent into wide valleys
+Normal rank
@@ Property / cites work @@
+Smoothing methods for nonsmooth, nonconvex minimization
+Normal rank
@@ Property / cites work @@
+Q5396673
@@ Property / cites work: Q5396673 / rank @@
+Normal rank
@@ Property / cites work @@
+Q5198904
@@ Property / cites work: Q5198904 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4399897
@@ Property / cites work: Q4399897 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4086303
@@ Property / cites work: Q4086303 / rank @@
+Normal rank
@@ Property / cites work @@
+Controlled Markov processes and viscosity solutions
+Normal rank
@@ Property / cites work @@
+Replica symmetry breaking condition exposed by random matrix calculation of landscape complexity
+Normal rank
@@ Property / cites work @@
+Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming
+Normal rank
@@ Property / cites work @@
+Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle
+Normal rank
@@ Property / cites work @@
+The Variational Formulation of the Fokker--Planck Equation
+Normal rank
@@ Property / cites work @@
+Mean field games
@@ Property / cites work: Mean field games / rank @@
+Normal rank
@@ Property / cites work @@
+Inequalities: theory of majorization and its applications
+Normal rank
@@ Property / cites work @@
+Q4237477
@@ Property / cites work: Q4237477 / rank @@
+Normal rank
@@ Property / cites work @@
+Proximité et dualité dans un espace hilbertien
@@ Property / cites work: Proximité et dualité dans un espace hilbertien / rank @@
+Normal rank
@@ Property / cites work @@
+Q3320132
@@ Property / cites work: Q3320132 / rank @@
+Normal rank
@@ Property / cites work @@
+Convergent Difference Schemes for Degenerate Elliptic and Parabolic Equations: Hamilton--Jacobi Equations and Free Boundary Problems
+Normal rank
@@ Property / cites work @@
+Stochastic Processes and Applications
@@ Property / cites work: Stochastic Processes and Applications / rank @@
+Normal rank
@@ Property / cites work @@
+Multiscale Methods
@@ Property / cites work: Multiscale Methods / rank @@
+Normal rank
@@ Property / cites work @@
+The Fokker-Planck equation. Methods of solution and applications
+Normal rank
@@ Property / cites work @@
+A Stochastic Approximation Method
@@ Property / cites work: A Stochastic Approximation Method / rank @@
+Normal rank
@@ Property / cites work @@
+Monotone Operators and the Proximal Point Algorithm
+Normal rank
@@ Property / cites work @@
+Learning representations by back-propagating errors
+Normal rank
@@ Property / cites work @@
+Optimal transport for applied mathematicians. Calculus of variations, PDEs, and modeling
+Normal rank
@@ Property / cites work @@
+Minimizing finite sums with the stochastic average gradient
+Normal rank
@@ Property / cites work @@
+Q2934059
@@ Property / cites work: Q2934059 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3560913
@@ Property / cites work: Q3560913 / rank @@
+Normal rank
@@ Property / Wikidata QID @@
+Q129603431
@@ Property / Wikidata QID: Q129603431 / rank @@
+Normal rank