On the number of regions of piecewise linear neural networks (Q6145187)

In the last decade the use of neural networks has had enormous success in many scientific disciplines. In particular in the application to train deep parametric models, applications range from computer vision, signal processing, language processing, neuroscience, learning in various forms and many more applications. For example, many popular deep models belong to the family of feedforward neural networks (NNs), for which the input-output mapping takes the form \(x\to \sigma_Lf_{\theta_L}\sigma_{L-1}f_{\theta_{L-1}} \cdots \sigma_2f_{\theta_2} \sigma_1f_{\theta_1}(x)\). Here, \(L\) is the number of layers of the NN (referred to as the depth of the NN), \(f_{\theta_k}:\mathbb R^{d_k}\to \mathbb R^{d_{k\to 1}}\) is an affine function parameterized by \(\theta_k\) and \(\sigma_k\) is a non-affine activation function. It is known that many feedforward neural networks (NNs) generate continuous and piecewise-linear (CPWL) mappings and partition the input domain into so-called linear regions on which the mapping is affine. The number of these regions induces a metric to characterize the expressiveness of CPWL NNs. Unfortunately, in practice the exact number of these regions is difficult to find and bounds have been proposed for specific architectures, including for ReLU and Maxout NNs. In this work, the authors generalize these bounds to NNs with arbitrary and possibly multivariate CPWL activation functions. They provide upper and lower bounds on the maximal number of linear regions of a CPWL NN given its depth, width, and the number of linear regions of its activation functions. Their results rely on the combinatorial structure of convex partitions and confirm the distinctive role of depth which, on its own, is able to exponentially increase the number of regions. The authors then introduce a stochastic type framework to estimate the average number of linear regions produced by a CPWL NN. Under various and reasonable assumptions, the expected density of linear regions along any 1D path is bounded by the product of depth, width, and a measure of activation complexity (up to a scaling factor). This yields the same role to the following three sources of expressiveness: no exponential growth with depth is seen anymore.

0 references

reviewed by

Steven B. Damelin

0 references

zbMATH Keywords

deep learning

0 references

expressivity

0 references

activation functions

0 references

continuous and piecewise-linear functions