Random neural networks in the infinite width limit as Gaussian processes (Q6138923): Difference between revisions

Neural networks, originally introduced in the 1940s and 1950s, have become extremely useful as powerful tools for a variety of mathematical problems in many subjects, for example image processing, machine learning, neuroscience, signal processing, manifold learning, language processing, probability and many others. See the paper under review for many references in this regard. This fascinating paper studies an important probabilistic question for a class of networks called fully connected neural networks. They are defined as follows: Fix a positive integer \(L\) as well as \(L+2\) positive integers \(n_0,\dots,n_{L+1}\) and a function \(\sigma: \mathbb R\to \mathbb R\). A fully connected depth-\(L\) neural network with input dimension \(n_0\), output dimension \(n_{L+1}\), hidden layer widths \(n_1...n_{L}\) and nonlinearity \(\sigma\) is any function \(x_{\alpha}\in \mathbb R^{n_0}\to z_{\alpha}^{(L+1)}\in \mathbb R^{n_{L+1}}\) of the form \[ z_{\alpha}^{(l)}=\left\{ \begin{array}{ll} W^{(1)}x_{\alpha}+b^{(1)}, & l=1\\ W^{(l)}\sigma(z_{\alpha}^{(l-1)})+b^{(l)},& l=2,...L+1, \end{array} \right. \] where \(W^{(l)}\in \mathbb R^{n_l\times n_{l-1}}\) are matrices, \(b^{(l)}\in \mathbb R^{n_l}\) are vectors and \(\sigma\) applied to a vector is shorthand for \(\sigma\) applied to each component. The parameters \(L, n_0, . . . , n_{L+1}\) are called the network architecture, and \(z_{\alpha}^{(l)}\in \mathbb R^{n_l}\) called the vector of pre-activations at layer \(l\) corresponding to input \(x_{\alpha}\). A fully connected network with a fixed architecture and given nonlinearity \(\sigma\) is therefore a finite but typically high-dimensional family of functions, parameterized by the network weights (entries of the weight matrices \(W^{(l)}\)) and biases (components of bias vectors \(b^{(l)}\)). This article considers the mapping \(x_{\alpha}\to z_{\alpha}^{(L+1)}\) when the network's weights and biases are chosen independently at random and the hidden layer widths \(n_1, . . . , n_L\) are sent to infinity while the input dimension \(n_0\), output dimension \(n_{L+1}\), and network depth \(L\) are fixed. In this infinite width limit, akin to the large matrix limit in random matrix theory, neural networks with random weights and biases converge to Gaussian processes. The main result of the paper under review is that this holds for general nonlinearities \(\sigma\) and distributions of network weights. The paper is well written with an excellent set of references.

0 references

reviewed by

Steven B. Damelin

0 references

zbMATH Keywords

neural networks

0 references

Gaussian processes

0 references

limit theorems

0 references

MaRDI profile type

Publication

0 references

cites work

Fluctuations of \(\beta\)-Jacobi product processes

0 references

Universal microscopic correlation functions for products of independent Ginibre matrices

0 references

Universal microscopic correlation functions for products of truncated unitary matrices

0 references

Benign overfitting in linear regression

0 references

Reconciling modern machine-learning practice and the classical bias–variance trade-off

0 references

Nonlinear approximation and (deep) ReLU networks

0 references

Neural network approximation

0 references

Noncommuting Random Products

0 references

Products of Random Matrices

0 references

Gaussian fluctuations for products of random matrices

0 references

Products of many large random matrices and gradients in deep neural networks

0 references

Non-asymptotic results for singular values of Gaussian matrix products

0 references

Surprises in high-dimensional ridgeless least squares interpolation

0 references

Estimation of moments of sums of independent real random variables

0 references

Bayesian learning for neural networks

0 references

Lectures on the Combinatorics of Free Probability

0 references

A note on the Pennington-Worah distribution

0 references

The Principles of Deep Learning Theory

0 references

Ergodic theory of differentiable dynamical systems

0 references

Addition of certain non-commuting random variables

0 references

On the distribution of the roots of certain symmetric matrices

0 references

Identifiers

arXiv ID

2107.01562

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:6138923

@@ Property / DOI @@
-.1214/23-aap1933
@@ Property / DOI: 10.1214/23-aap1933 / rank @@
-Normal rank
@@ Property / cites work @@
+Fluctuations of \(\beta\)-Jacobi product processes
+Normal rank
@@ Property / cites work @@
+Universal microscopic correlation functions for products of independent Ginibre matrices
+Normal rank
@@ Property / cites work @@
+Universal microscopic correlation functions for products of truncated unitary matrices
+Normal rank
@@ Property / cites work @@
+Benign overfitting in linear regression
@@ Property / cites work: Benign overfitting in linear regression / rank @@
+Normal rank
@@ Property / cites work @@
+Reconciling modern machine-learning practice and the classical bias–variance trade-off
+Normal rank
@@ Property / cites work @@
+Nonlinear approximation and (deep) ReLU networks
@@ Property / cites work: Nonlinear approximation and (deep) ReLU networks / rank @@
+Normal rank
@@ Property / cites work @@
+Neural network approximation
@@ Property / cites work: Neural network approximation / rank @@
+Normal rank
@@ Property / cites work @@
+Noncommuting Random Products
@@ Property / cites work: Noncommuting Random Products / rank @@
+Normal rank
@@ Property / cites work @@
+Products of Random Matrices
@@ Property / cites work: Products of Random Matrices / rank @@
+Normal rank
@@ Property / cites work @@
+Gaussian fluctuations for products of random matrices
+Normal rank
@@ Property / cites work @@
+Products of many large random matrices and gradients in deep neural networks
+Normal rank
@@ Property / cites work @@
+Non-asymptotic results for singular values of Gaussian matrix products
+Normal rank
@@ Property / cites work @@
+Surprises in high-dimensional ridgeless least squares interpolation
+Normal rank
@@ Property / cites work @@
+Estimation of moments of sums of independent real random variables
+Normal rank
@@ Property / cites work @@
+Bayesian learning for neural networks
@@ Property / cites work: Bayesian learning for neural networks / rank @@
+Normal rank
@@ Property / cites work @@
+Lectures on the Combinatorics of Free Probability
@@ Property / cites work: Lectures on the Combinatorics of Free Probability / rank @@
+Normal rank
@@ Property / cites work @@
+A note on the Pennington-Worah distribution
@@ Property / cites work: A note on the Pennington-Worah distribution / rank @@
+Normal rank
@@ Property / cites work @@
+The Principles of Deep Learning Theory
@@ Property / cites work: The Principles of Deep Learning Theory / rank @@
+Normal rank
@@ Property / cites work @@
+Ergodic theory of differentiable dynamical systems
+Normal rank
@@ Property / cites work @@
+Addition of certain non-commuting random variables
+Normal rank
@@ Property / cites work @@
+On the distribution of the roots of certain symmetric matrices
+Normal rank
@@ Property / DOI @@
+.1214/23-AAP1933
@@ Property / DOI: 10.1214/23-AAP1933 / rank @@
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:6138923