Infinite-width limit of deep linear neural networks (Q6587580)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: Infinite-width limit of deep linear neural networks |
scientific article; zbMATH DE number 7896920
| Language | Label | Description | Also known as |
|---|---|---|---|
| default for all languages | No label defined |
||
| English | Infinite-width limit of deep linear neural networks |
scientific article; zbMATH DE number 7896920 |
Statements
Infinite-width limit of deep linear neural networks (English)
0 references
14 August 2024
0 references
This paper studies an important topic in artificial neural networks, namely a description of the training dynamics of these networks in the infinite-width limit. In recent years, this study has helped scientists to understand various aspects of deep learning such as (1) the importance of the choice of scalings/parametrization when passing to the limit since several well-behaved but fundamentally different limits can be obtained. (2) The characterization of the long-term behavior of the dynamics such as global convergence which helps understanding the learning abilities of neural nets and (3) the existence of well-posed limits. Problems of this kind are well understood for two-layer neural nets, but their are many open problems for deeper neural nets. The paper under review studies the infinite-width limit of deep linear neural networks which are initialized with random parameters. The authors study several things. (1) First they show that when the number of parameters diverges, the training dynamics converge to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear neural net. (2) If the weights remain random, they obtain a precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. (3) The authors study the continuous time limit obtained for infinitely wide linear neural nets and show that the linear predictors of the neural net converge at an exponential rate to the minimal \(l_2\)-norm minimizer of the risk.\N\NThe paper is well written with an excellent set of references.
0 references
deep networks
0 references
deep learning
0 references
infinite-width
0 references
0 references
0 references
0 references
0 references
0 references
0.8408519625663757
0 references
0.8032009601593018
0 references
0.7772327661514282
0 references
0.7656422853469849
0 references
0.7583993077278137
0 references