Lu decomposition and Toeplitz decomposition of a neural network (Q6185692)

In providing a justification for the importance of feed-forward neural networks, it is now known that several universal approximation theorems exist and are proved using the following types of networks: (i) shallow wide networks: neural networks of fixed depth and arbitrary width; (ii) deep narrow networks: neural networks with fixed width and arbitrary depth. The theorems themselves use neural networks to show that under different settings and in various well-defined frameworks, these networks can approximate various classes of functions to arbitrary accuracy under various measures of accuracy. This is an extremely important idea in many areas for example in approximation theory. As examples of such results, Theorems 1.1 and 1.2 in the paper under review are density results and are due to Pinkus and Kidger and Lyons. Typically, in all these results, the neural networks are fully connected. In this interesting paper, the authors show that for weight matrices with special structure, for example upper and lower triangular, Toeplitz or Hankel, the same type of universal approximation results exist for both shallow wide and deep narrow networks. Numerically, the authors show that when kept at the same depth and width, a neural network with these structured weight matrices suffers almost no loss in expressive power, but requires only a fraction of the parameters cost of \(O(n\log n)\) operations as opposed to the usual \(O(n^2)\). To give a small amount of insight, we note that any matrix \(A\) has an LU decomposition up to a permutation. This means a decomposition up to a permutation of an upper triangular and lower triangular matrix. Similarly, any matrix \(A\) has a Toeplitz decomposition as well. The authors essentially prove that any continuous function \(f:\mathbb R^n\to \mathbb R^m\) can be approximated to arbitrary accuracy by a neural network that maps an \(x\in \mathbb R^n\) to \(L_1\sigma_1U_1\sigma_2L_2\sigma_3U_2...L_r\sigma_{2r-1}U_r,\, x\) \(\in \mathbb R^m\) where the weight matrices alternate between upper and lower triangular matrices, \(\sigma_i(x):=\sigma(x-b_i)\), with some bias vector \(b_i\) and \(\sigma\) any nonpolynomial function, uniformly continuous. The authors establish the same kind of results for Toeplitz matrices and Hankel ones. A consequence of the Toeplitz result is a fixed-width universal approximation theorem for convolutional neural networks, which so far have only arbitrary width versions. Since the results apply in particular to the case when \(f\) is a general neural network, one may regard these results as LU and Toeplitz decompositions of a neural network. Thus one may reduce the number of weight parameters in a neural network without sacrificing its power of universal approximation as is shown above.

0 references

reviewed by

Steven B. Damelin

0 references

zbMATH Keywords

neural networks

0 references

Toeplitz matrices

0 references

Hankel matrices

0 references

triangular matrices