Neural network approximation: three hidden layers are enough (Q6054944)

From MaRDI portal
scientific article; zbMATH DE number 7743433
Language Label Description Also known as
English
Neural network approximation: three hidden layers are enough
scientific article; zbMATH DE number 7743433

    Statements

    Neural network approximation: three hidden layers are enough (English)
    0 references
    0 references
    0 references
    0 references
    28 September 2023
    0 references
    It is well-known that a sufficiently deep neural network has great power in approximating high-dimensional complex functions. Surprisingly, it was constructed in the paper a neural network which has three hidden layers only but possesses super approximation power. The network uses different activation functions at the three hidden layers. Specifically, the network in approximating a function \(f\) is given by \[ \phi(\mathbf{x})=2\omega_f(2\sqrt{d})\sum_{j=1}^N2^{-j}\sigma_3\biggl(a_j\cdot \sigma_2\bigl(1+\sum_{i=1}^d2^{(i-1)N}\sigma_1(2^{N-1}x_i)\bigr)\biggr)+f(\mathbf{0})-\omega_f(2\sqrt{d}),\ \mathbf{x}=(x_1,x_2,\ldots,x_d)\in\mathbf{R}^d, \] where \(N\) denotes the width of the network, \(\omega_f(\cdot)\) is the modulus of continuity of \(f\), \(a_i\in[0,\frac12)\), \(1\le i\le N\), and \[ \sigma_1(x):=\lfloor x\rfloor,\ \sigma_2(x):=2^x, \ \sigma_3(x)=\mathcal{T}(x-\lfloor x\rfloor-\frac12),\ \ x\in\mathbf{R}, \] with \[ \mathcal{T}(x):=\left\{ \begin{array}{ll} 1,&x\ge 0,\\ 0,&x<0. \end{array} \right. \] It was proved in the paper that for a continuous function \(f\) on \([0,1]^d\), there exist \(a_1,a_2,\ldots,a_N\in[0,\frac12)\) such that \[ |f(\mathbf{x})-\phi(\mathbf{x})|\le 2\omega_f(2\sqrt{d})2^{-N}+\omega_f(2\sqrt{d}2^{-N}),\ \ \mathbf{x}\in[0,1]^d. \] Consequently, when \(f\) is Hölder continuous of order \(\alpha\in(0,1]\) with a Hölder constant \(\lambda\), there exist \(a_1,a_2,\ldots,a_N\in[0,\frac12)\) such that \[ |f(\mathbf{x})-\phi(\mathbf{x})|\le 3\lambda(2\sqrt{d})^\alpha2^{-\alpha N},\ \ \mathbf{x}\in[0,1]^d, \] which implies that the proposed three-hidden-layer neural network can exponentially approximate a Hölder continuous function as the width increases. The results reveal an interesting and important property of the expressive power of deep neural networks. Applications of the results to machine learning are also discussed in the paper.
    0 references
    exponential convergence
    0 references
    curse of dimensionality
    0 references
    deep neural network
    0 references
    floor-exponential-step activation function
    0 references
    continuous function
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers