Deep relaxation: partial differential equations for optimizing deep neural networks (Q2319762): Difference between revisions

From MaRDI portal
Changed an Item
Created claim: Wikidata QID (P12): Q129603431, #quickstatements; #temporary_batch_1727994688163
 
(4 intermediate revisions by 4 users not shown)
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W2963480765 / rank
 
Normal rank
Property / arXiv ID
 
Property / arXiv ID: 1704.04932 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5186515 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Local entropy as a measure for sampling solutions in constraint satisfaction problems / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4830373 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Optimization Methods for Large-Scale Machine Learning / rank
 
Normal rank
Property / cites work
 
Property / cites work: Semiconcave functions, Hamilton-Jacobi equations, and optimal control / rank
 
Normal rank
Property / cites work
 
Property / cites work: Contractions in the 2-Wasserstein length space and thermalization of granular media / rank
 
Normal rank
Property / cites work
 
Property / cites work: Entropy-SGD: biasing gradient descent into wide valleys / rank
 
Normal rank
Property / cites work
 
Property / cites work: Smoothing methods for nonsmooth, nonconvex minimization / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5396673 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q5198904 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4399897 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4086303 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Controlled Markov processes and viscosity solutions / rank
 
Normal rank
Property / cites work
 
Property / cites work: Replica symmetry breaking condition exposed by random matrix calculation of landscape complexity / rank
 
Normal rank
Property / cites work
 
Property / cites work: Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming / rank
 
Normal rank
Property / cites work
 
Property / cites work: Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle / rank
 
Normal rank
Property / cites work
 
Property / cites work: The Variational Formulation of the Fokker--Planck Equation / rank
 
Normal rank
Property / cites work
 
Property / cites work: Mean field games / rank
 
Normal rank
Property / cites work
 
Property / cites work: Inequalities: theory of majorization and its applications / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4237477 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Proximité et dualité dans un espace hilbertien / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3320132 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Convergent Difference Schemes for Degenerate Elliptic and Parabolic Equations: Hamilton--Jacobi Equations and Free Boundary Problems / rank
 
Normal rank
Property / cites work
 
Property / cites work: Stochastic Processes and Applications / rank
 
Normal rank
Property / cites work
 
Property / cites work: Multiscale Methods / rank
 
Normal rank
Property / cites work
 
Property / cites work: The Fokker-Planck equation. Methods of solution and applications / rank
 
Normal rank
Property / cites work
 
Property / cites work: A Stochastic Approximation Method / rank
 
Normal rank
Property / cites work
 
Property / cites work: Monotone Operators and the Proximal Point Algorithm / rank
 
Normal rank
Property / cites work
 
Property / cites work: Learning representations by back-propagating errors / rank
 
Normal rank
Property / cites work
 
Property / cites work: Optimal transport for applied mathematicians. Calculus of variations, PDEs, and modeling / rank
 
Normal rank
Property / cites work
 
Property / cites work: Minimizing finite sums with the stochastic average gradient / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q2934059 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3560913 / rank
 
Normal rank
Property / Wikidata QID
 
Property / Wikidata QID: Q129603431 / rank
 
Normal rank

Latest revision as of 23:32, 3 October 2024

scientific article
Language Label Description Also known as
English
Deep relaxation: partial differential equations for optimizing deep neural networks
scientific article

    Statements

    Deep relaxation: partial differential equations for optimizing deep neural networks (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    20 August 2019
    0 references
    The output of a deep neural network is defined as \(y(x;\xi)=\sigma(x^p\sigma(x^{p-1}\dots\sigma(x^1\xi)\dots)\) which is a nested composition of linear functions depending on inputs \(\xi\in \mathbb{R}^d\) and weights \(x\in \mathbb{R}^n\), which are the parameters of the network. Performing a supervised training the goal is to minimize a certain loss function \(f(x)\). In the background part, there is a short discussion on stochastic gradient descent (SGD) and stochastic gradient descent in continuous time methods, as well as a presentation of some references. The third section concentrates on few results on the PDE-interpretation of local entropy, as a derivation of the viscous Hamilton-Jacobi PDE or the Hopf-Lax formula for the Hamilton-Jacobi equation. The fourth section deals with the derivation of local entropy via homogenization of SDEs. Further, one shows results on stochastic control for a variant of local entropy. Since it has been proved, in a previous section, that the regularized loss function is the solution of a viscous Hamilton-Jacobi equation, one can apply semi-concavity estimates from PDE theory and quantify the amount of smoothing. Some examples to illustrate the widening of local minima are presented. The last two sections are devoted to a comparison of various algorithms presented in this paper. The aim is to show that the considered collection of PDE methods improve results on modern datasets as MNIST or CIFAR datasets. All along the article, one uses intensively the published results of the authors, compares results, investigates and improves some of the details.
    0 references
    deep learning
    0 references
    partial differential equations
    0 references
    stochastic gradient descent
    0 references
    neural networks
    0 references
    optimal control
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references