Random neural networks in the infinite width limit as Gaussian processes (Q6138923)

Neural networks, originally introduced in the 1940s and 1950s, have become extremely useful as powerful tools for a variety of mathematical problems in many subjects, for example image processing, machine learning, neuroscience, signal processing, manifold learning, language processing, probability and many others. See the paper under review for many references in this regard. This fascinating paper studies an important probabilistic question for a class of networks called fully connected neural networks. They are defined as follows: Fix a positive integer \(L\) as well as \(L+2\) positive integers \(n_0,\dots,n_{L+1}\) and a function \(\sigma: \mathbb R\to \mathbb R\). A fully connected depth-\(L\) neural network with input dimension \(n_0\), output dimension \(n_{L+1}\), hidden layer widths \(n_1...n_{L}\) and nonlinearity \(\sigma\) is any function \(x_{\alpha}\in \mathbb R^{n_0}\to z_{\alpha}^{(L+1)}\in \mathbb R^{n_{L+1}}\) of the form \[ z_{\alpha}^{(l)}=\left\{ \begin{array}{ll} W^{(1)}x_{\alpha}+b^{(1)}, & l=1\\ W^{(l)}\sigma(z_{\alpha}^{(l-1)})+b^{(l)},& l=2,...L+1, \end{array} \right. \] where \(W^{(l)}\in \mathbb R^{n_l\times n_{l-1}}\) are matrices, \(b^{(l)}\in \mathbb R^{n_l}\) are vectors and \(\sigma\) applied to a vector is shorthand for \(\sigma\) applied to each component. The parameters \(L, n_0, . . . , n_{L+1}\) are called the network architecture, and \(z_{\alpha}^{(l)}\in \mathbb R^{n_l}\) called the vector of pre-activations at layer \(l\) corresponding to input \(x_{\alpha}\). A fully connected network with a fixed architecture and given nonlinearity \(\sigma\) is therefore a finite but typically high-dimensional family of functions, parameterized by the network weights (entries of the weight matrices \(W^{(l)}\)) and biases (components of bias vectors \(b^{(l)}\)). This article considers the mapping \(x_{\alpha}\to z_{\alpha}^{(L+1)}\) when the network's weights and biases are chosen independently at random and the hidden layer widths \(n_1, . . . , n_L\) are sent to infinity while the input dimension \(n_0\), output dimension \(n_{L+1}\), and network depth \(L\) are fixed. In this infinite width limit, akin to the large matrix limit in random matrix theory, neural networks with random weights and biases converge to Gaussian processes. The main result of the paper under review is that this holds for general nonlinearities \(\sigma\) and distributions of network weights. The paper is well written with an excellent set of references.

0 references

reviewed by

Steven B. Damelin

0 references

zbMATH Keywords

neural networks

0 references

Gaussian processes

0 references

limit theorems

0 references

MaRDI profile type