Infinite-width limit of deep linear neural networks (Q6587580)

From MaRDI portal

Jump to:navigation, search

!

WARNING

This is the item page for this Wikibase entity, intended for internal use and editing purposes.

Please use the normal view instead:

Infinite-width limit of deep linear neural networks

scientific article; zbMATH DE number 7896920

Language	Label	Description	Also known as
default for all languages	No label defined
English	Infinite-width limit of deep linear neural networks	scientific article; zbMATH DE number 7896920

Statements

scholarly article

0 references

Infinite-width limit of deep linear neural networks (English)

0 references

Lénaïc Chizat

0 references

0 references

Xavier Fernández-Real

0 references

Alessio Figalli

0 references

Communications on Pure and Applied Mathematics

0 references

publication date

14 August 2024

0 references

This paper studies an important topic in artificial neural networks, namely a description of the training dynamics of these networks in the infinite-width limit. In recent years, this study has helped scientists to understand various aspects of deep learning such as (1) the importance of the choice of scalings/parametrization when passing to the limit since several well-behaved but fundamentally different limits can be obtained. (2) The characterization of the long-term behavior of the dynamics such as global convergence which helps understanding the learning abilities of neural nets and (3) the existence of well-posed limits. Problems of this kind are well understood for two-layer neural nets, but their are many open problems for deeper neural nets. The paper under review studies the infinite-width limit of deep linear neural networks which are initialized with random parameters. The authors study several things. (1) First they show that when the number of parameters diverges, the training dynamics converge to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear neural net. (2) If the weights remain random, they obtain a precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. (3) The authors study the continuous time limit obtained for infinitely wide linear neural nets and show that the linear predictors of the neural net converge at an exponential rate to the minimal \(l_2\)-norm minimizer of the risk.\N\NThe paper is well written with an excellent set of references.

0 references

0 references

zbMATH Keywords

deep networks

0 references

deep learning

0 references

infinite-width

0 references

MaRDI profile type

MaRDI publication profile

0 references

Representations for partially exchangeable arrays of random variables

0 references

Gradient descent on infinitely wide neural networks: global convergence and generalization

0 references

Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers

0 references

Approximation and estimation bounds for artificial neural networks

0 references

The Dynamics of Message Passing on Dense Graphs, with Applications to Compressed Sensing

0 references

An iterative construction of solutions of the TAP equations for the Sherrington-Kirkpatrick model

0 references

Deep Linear Networks for Matrix Completion—an Infinite Depth Limit

0 references

The Continuous Formulation of Shallow Neural Networks as Wasserstein-Type Gradient Flows

0 references

Disentangling feature and lazy training in deep neural networks

0 references

Products of many large random matrices and gradients in deep neural networks

0 references

A mean field view of the landscape of two-layer neural networks

0 references

Bayesian learning for neural networks

0 references

Gradient flows on graphons: existence, convergence, continuity equations

0 references

A mathematical theory of semantic development in deep neural networks

0 references

Mean field analysis of neural networks: a law of large numbers

0 references

High-dimensional probability. An introduction with applications in data science

0 references

Recommended article

Wide neural networks of any depth evolve as linear models under gradient descent <sup>*</sup>

Similarity Score

0.8408519625663757

Recommender Run

Recommender Run 4

0 references

Random neural networks in the infinite width limit as Gaussian processes

Similarity Score

0.8032009601593018

Recommender Run

Recommender Run 4

0 references

Gradient descent on infinitely wide neural networks: global convergence and generalization

Similarity Score

0.7772327661514282

Recommender Run

Recommender Run 4

0 references

Mean Field Analysis of Deep Neural Networks

Similarity Score

0.7656422853469849

Recommender Run

Recommender Run 4

0 references

Mean-field limits of trained weights in deep learning: a dynamical systems perspective

Similarity Score

0.7583993077278137

Recommender Run

Recommender Run 4

0 references

copyright license

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

0 references

Identifiers

Mathematics Subject Classification ID

0 references

0 references

zbMATH DE Number

0 references

10.1002/CPA.22200

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:6587580

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q6587580&oldid=56844036"