Depth separations in neural networks: what is actually being separated? (Q2117335)

The authors consider approximation properties of depth 2 networks \[ N_2(\mathbf{x})=\sum_{i=1}^wu_i\sigma(\mathbf{w}_i^{\mathsf{T}}\mathbf{x}+b_i). \] The main results are given in three subsections of Section 2. Subsection 2.1 contains a formal result implying that radial functions can be approximated with depth 2, width (parameter \( w \)) poly(\( d \)) (\( \mathbf{x},\mathbf{w}_i\in\mathbb{R}^d \)) networks, to any constant accuracy \( \epsilon. \) This result is proved for networks employing any activation function \( \sigma \) satisfying mild assumption, which implies that the activation can be used to approximate univariate functions well. This assumption is satisfied for all standard activations such as ReLU and sigmoidal functions. In Subsection 2.2 the authors show how Lipschitz radial functions can be approximated by width poly(\( 1/\epsilon) \) depth 2 ReLU networks. Results of Subsection 2.3 complement previous positive approximation results with negative results. Section 3 contains proofs.

0 references

zbMATH Keywords

deep learning

0 references

neural network

0 references

approximation theory

0 references

depth separation

0 references

reviewed by

Alexey L. Lukashov