Neural network approximation: three hidden layers are enough (Q6054944)

It is well-known that a sufficiently deep neural network has great power in approximating high-dimensional complex functions. Surprisingly, it was constructed in the paper a neural network which has three hidden layers only but possesses super approximation power. The network uses different activation functions at the three hidden layers. Specifically, the network in approximating a function \(f\) is given by \[ \phi(\mathbf{x})=2\omega_f(2\sqrt{d})\sum_{j=1}^N2^{-j}\sigma_3\biggl(a_j\cdot \sigma_2\bigl(1+\sum_{i=1}^d2^{(i-1)N}\sigma_1(2^{N-1}x_i)\bigr)\biggr)+f(\mathbf{0})-\omega_f(2\sqrt{d}),\ \mathbf{x}=(x_1,x_2,\ldots,x_d)\in\mathbf{R}^d, \] where \(N\) denotes the width of the network, \(\omega_f(\cdot)\) is the modulus of continuity of \(f\), \(a_i\in[0,\frac12)\), \(1\le i\le N\), and \[ \sigma_1(x):=\lfloor x\rfloor,\ \sigma_2(x):=2^x, \ \sigma_3(x)=\mathcal{T}(x-\lfloor x\rfloor-\frac12),\ \ x\in\mathbf{R}, \] with \[ \mathcal{T}(x):=\left\{ \begin{array}{ll} 1,&x\ge 0,\\ 0,&x<0. \end{array} \right. \] It was proved in the paper that for a continuous function \(f\) on \([0,1]^d\), there exist \(a_1,a_2,\ldots,a_N\in[0,\frac12)\) such that \[ |f(\mathbf{x})-\phi(\mathbf{x})|\le 2\omega_f(2\sqrt{d})2^{-N}+\omega_f(2\sqrt{d}2^{-N}),\ \ \mathbf{x}\in[0,1]^d. \] Consequently, when \(f\) is Hölder continuous of order \(\alpha\in(0,1]\) with a Hölder constant \(\lambda\), there exist \(a_1,a_2,\ldots,a_N\in[0,\frac12)\) such that \[ |f(\mathbf{x})-\phi(\mathbf{x})|\le 3\lambda(2\sqrt{d})^\alpha2^{-\alpha N},\ \ \mathbf{x}\in[0,1]^d, \] which implies that the proposed three-hidden-layer neural network can exponentially approximate a Hölder continuous function as the width increases. The results reveal an interesting and important property of the expressive power of deep neural networks. Applications of the results to machine learning are also discussed in the paper.

0 references

reviewed by

Haizhang Zhang

0 references

zbMATH Keywords

exponential convergence

0 references

curse of dimensionality

0 references

deep neural network

0 references

floor-exponential-step activation function