Random neural networks in the infinite width limit as Gaussian processes (Q6138923): Difference between revisions

From MaRDI portal
Importer (talk | contribs)
Created a new Item
 
Normalize DOI.
 
(2 intermediate revisions by 2 users not shown)
Property / DOI
 
Property / DOI: 10.1214/23-aap1933 / rank
Normal rank
 
Property / cites work
 
Property / cites work: Fluctuations of \(\beta\)-Jacobi product processes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Universal microscopic correlation functions for products of independent Ginibre matrices / rank
 
Normal rank
Property / cites work
 
Property / cites work: Universal microscopic correlation functions for products of truncated unitary matrices / rank
 
Normal rank
Property / cites work
 
Property / cites work: Benign overfitting in linear regression / rank
 
Normal rank
Property / cites work
 
Property / cites work: Reconciling modern machine-learning practice and the classical bias–variance trade-off / rank
 
Normal rank
Property / cites work
 
Property / cites work: Nonlinear approximation and (deep) ReLU networks / rank
 
Normal rank
Property / cites work
 
Property / cites work: Neural network approximation / rank
 
Normal rank
Property / cites work
 
Property / cites work: Noncommuting Random Products / rank
 
Normal rank
Property / cites work
 
Property / cites work: Products of Random Matrices / rank
 
Normal rank
Property / cites work
 
Property / cites work: Gaussian fluctuations for products of random matrices / rank
 
Normal rank
Property / cites work
 
Property / cites work: Products of many large random matrices and gradients in deep neural networks / rank
 
Normal rank
Property / cites work
 
Property / cites work: Non-asymptotic results for singular values of Gaussian matrix products / rank
 
Normal rank
Property / cites work
 
Property / cites work: Surprises in high-dimensional ridgeless least squares interpolation / rank
 
Normal rank
Property / cites work
 
Property / cites work: Estimation of moments of sums of independent real random variables / rank
 
Normal rank
Property / cites work
 
Property / cites work: Bayesian learning for neural networks / rank
 
Normal rank
Property / cites work
 
Property / cites work: Lectures on the Combinatorics of Free Probability / rank
 
Normal rank
Property / cites work
 
Property / cites work: A note on the Pennington-Worah distribution / rank
 
Normal rank
Property / cites work
 
Property / cites work: The Principles of Deep Learning Theory / rank
 
Normal rank
Property / cites work
 
Property / cites work: Ergodic theory of differentiable dynamical systems / rank
 
Normal rank
Property / cites work
 
Property / cites work: Addition of certain non-commuting random variables / rank
 
Normal rank
Property / cites work
 
Property / cites work: On the distribution of the roots of certain symmetric matrices / rank
 
Normal rank
Property / DOI
 
Property / DOI: 10.1214/23-AAP1933 / rank
 
Normal rank
links / mardi / namelinks / mardi / name
 

Latest revision as of 18:47, 30 December 2024

scientific article; zbMATH DE number 7789647
Language Label Description Also known as
English
Random neural networks in the infinite width limit as Gaussian processes
scientific article; zbMATH DE number 7789647

    Statements

    Random neural networks in the infinite width limit as Gaussian processes (English)
    0 references
    0 references
    16 January 2024
    0 references
    Neural networks, originally introduced in the 1940s and 1950s, have become extremely useful as powerful tools for a variety of mathematical problems in many subjects, for example image processing, machine learning, neuroscience, signal processing, manifold learning, language processing, probability and many others. See the paper under review for many references in this regard. This fascinating paper studies an important probabilistic question for a class of networks called fully connected neural networks. They are defined as follows: Fix a positive integer \(L\) as well as \(L+2\) positive integers \(n_0,\dots,n_{L+1}\) and a function \(\sigma: \mathbb R\to \mathbb R\). A fully connected depth-\(L\) neural network with input dimension \(n_0\), output dimension \(n_{L+1}\), hidden layer widths \(n_1...n_{L}\) and nonlinearity \(\sigma\) is any function \(x_{\alpha}\in \mathbb R^{n_0}\to z_{\alpha}^{(L+1)}\in \mathbb R^{n_{L+1}}\) of the form \[ z_{\alpha}^{(l)}=\left\{ \begin{array}{ll} W^{(1)}x_{\alpha}+b^{(1)}, & l=1\\ W^{(l)}\sigma(z_{\alpha}^{(l-1)})+b^{(l)},& l=2,...L+1, \end{array} \right. \] where \(W^{(l)}\in \mathbb R^{n_l\times n_{l-1}}\) are matrices, \(b^{(l)}\in \mathbb R^{n_l}\) are vectors and \(\sigma\) applied to a vector is shorthand for \(\sigma\) applied to each component. The parameters \(L, n_0, . . . , n_{L+1}\) are called the network architecture, and \(z_{\alpha}^{(l)}\in \mathbb R^{n_l}\) called the vector of pre-activations at layer \(l\) corresponding to input \(x_{\alpha}\). A fully connected network with a fixed architecture and given nonlinearity \(\sigma\) is therefore a finite but typically high-dimensional family of functions, parameterized by the network weights (entries of the weight matrices \(W^{(l)}\)) and biases (components of bias vectors \(b^{(l)}\)). This article considers the mapping \(x_{\alpha}\to z_{\alpha}^{(L+1)}\) when the network's weights and biases are chosen independently at random and the hidden layer widths \(n_1, . . . , n_L\) are sent to infinity while the input dimension \(n_0\), output dimension \(n_{L+1}\), and network depth \(L\) are fixed. In this infinite width limit, akin to the large matrix limit in random matrix theory, neural networks with random weights and biases converge to Gaussian processes. The main result of the paper under review is that this holds for general nonlinearities \(\sigma\) and distributions of network weights. The paper is well written with an excellent set of references.
    0 references
    neural networks
    0 references
    Gaussian processes
    0 references
    limit theorems
    0 references

    Identifiers

    0 references
    0 references
    0 references