Infinite-width limit of deep linear neural networks (Q6587580)

From MaRDI portal





scientific article; zbMATH DE number 7896920
Language Label Description Also known as
default for all languages
No label defined
    English
    Infinite-width limit of deep linear neural networks
    scientific article; zbMATH DE number 7896920

      Statements

      Infinite-width limit of deep linear neural networks (English)
      0 references
      0 references
      0 references
      0 references
      0 references
      14 August 2024
      0 references
      This paper studies an important topic in artificial neural networks, namely a description of the training dynamics of these networks in the infinite-width limit. In recent years, this study has helped scientists to understand various aspects of deep learning such as (1) the importance of the choice of scalings/parametrization when passing to the limit since several well-behaved but fundamentally different limits can be obtained. (2) The characterization of the long-term behavior of the dynamics such as global convergence which helps understanding the learning abilities of neural nets and (3) the existence of well-posed limits. Problems of this kind are well understood for two-layer neural nets, but their are many open problems for deeper neural nets. The paper under review studies the infinite-width limit of deep linear neural networks which are initialized with random parameters. The authors study several things. (1) First they show that when the number of parameters diverges, the training dynamics converge to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear neural net. (2) If the weights remain random, they obtain a precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. (3) The authors study the continuous time limit obtained for infinitely wide linear neural nets and show that the linear predictors of the neural net converge at an exponential rate to the minimal \(l_2\)-norm minimizer of the risk.\N\NThe paper is well written with an excellent set of references.
      0 references
      0 references
      deep networks
      0 references
      deep learning
      0 references
      infinite-width
      0 references

      Identifiers

      0 references
      0 references