Deep relaxation: partial differential equations for optimizing deep neural networks (Q2319762): Difference between revisions
From MaRDI portal
Added link to MaRDI item. |
Removed claim: author (P16): Item:Q589973 |
||
Property / author | |||
Property / author: Stanley J. Osher / rank | |||
Revision as of 16:48, 19 February 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Deep relaxation: partial differential equations for optimizing deep neural networks |
scientific article |
Statements
Deep relaxation: partial differential equations for optimizing deep neural networks (English)
0 references
20 August 2019
0 references
The output of a deep neural network is defined as \(y(x;\xi)=\sigma(x^p\sigma(x^{p-1}\dots\sigma(x^1\xi)\dots)\) which is a nested composition of linear functions depending on inputs \(\xi\in \mathbb{R}^d\) and weights \(x\in \mathbb{R}^n\), which are the parameters of the network. Performing a supervised training the goal is to minimize a certain loss function \(f(x)\). In the background part, there is a short discussion on stochastic gradient descent (SGD) and stochastic gradient descent in continuous time methods, as well as a presentation of some references. The third section concentrates on few results on the PDE-interpretation of local entropy, as a derivation of the viscous Hamilton-Jacobi PDE or the Hopf-Lax formula for the Hamilton-Jacobi equation. The fourth section deals with the derivation of local entropy via homogenization of SDEs. Further, one shows results on stochastic control for a variant of local entropy. Since it has been proved, in a previous section, that the regularized loss function is the solution of a viscous Hamilton-Jacobi equation, one can apply semi-concavity estimates from PDE theory and quantify the amount of smoothing. Some examples to illustrate the widening of local minima are presented. The last two sections are devoted to a comparison of various algorithms presented in this paper. The aim is to show that the considered collection of PDE methods improve results on modern datasets as MNIST or CIFAR datasets. All along the article, one uses intensively the published results of the authors, compares results, investigates and improves some of the details.
0 references
deep learning
0 references
partial differential equations
0 references
stochastic gradient descent
0 references
neural networks
0 references
optimal control
0 references