Approximation spaces of deep neural networks (Q2117336): Difference between revisions

From MaRDI portal
Changed an Item
Import240304020342 (talk | contribs)
Set profile property.
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank

Revision as of 05:56, 5 March 2024

scientific article
Language Label Description Also known as
English
Approximation spaces of deep neural networks
scientific article

    Statements

    Approximation spaces of deep neural networks (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    21 March 2022
    0 references
    A formal representation of a deep neural network could be conceived as a tuple $\Phi= ((T_1,\alpha _1),\dots, (T_L,\alpha_L))$, where $T_{\mathit{l}}$ are affine-linear maps, $T_{\mathit{l}}(x)=A_{\mathit{l}}x+b_{\mathit{l}}$, $A_{\mathit{l}}$ are matrices and $b_{\mathit{l}}$ vectors, $\alpha_{\mathit{l}}$ are some nonlinearities and \(L\) denotes the number of layers in the network. One defines, as realization of the deep neural network $\Phi$, the function \[\mathcal{R}(\Phi):=\alpha_L \circ T_L\circ \cdots \circ \alpha_1 \circ T_1\] and is implemented by applying the maps layer-wise. The central task of a neural network is in general the approximation of a function \(f\), given a set of training data ($x_i, f(x_i)$. One defines a loss function $\mathcal{L}$, a regulariser $\mathcal{P}$ and the objective is to solve the optimization problem: find a neural network structure $\Phi$, such that $\sum_{i=1}^{m}\mathcal{L}(\mathcal{R}(\Phi)(x_i,f(x_i)) + \lambda\mathcal{P}(\Phi)$ gets minimized. The objective it to achieve a best possible approximation for \(f\). The aim of the article is to introduce and investigate approximation spaces associated with neural networks. One expects that the results will have an impact on domains such as theory of expressivity, statistical analysis of deep learning or design of deep neural networks. The second section of the article is devoted to the definition of neural networks and elementary properties. In the third section one introduces classical approximation spaces the way they are described in Chapter 7 of the book [\textit{R. A. DeVore} and \textit{G. G. Lorentz}, Constructive approximation. Berlin: Springer-Verlag (1993; Zbl 0797.41016)]. By suitable specialization, these spaces are then used in the context of neural networks as neural network approximation spaces. In subsections one concentrates for instance on connectivity versus number of neurons and on relations between approximation classes associated with different depth growth functions. One points out the importance of the choice of the activation function on different approximation spaces. The fourth section is devoted mainly to an investigation on approximation spaces of the ReLU networks. Embeddings between Besov spaces and neural network approximation spaces, direct estimates and inverse estimates, are largely discussed in the fifth section. Additional details and proofs are given in the Appendix A, A1--A11 for Section 2, Appendix B, B1--B4 for Section 3, Appendix C, C1--C4 for Section 4, Appendix D, D1--D5 for Section5 and Appendix E. The appendix part extends over almost the half of the article. References include 69 titles.
    0 references
    deep neural networks
    0 references
    sparsely connected networks
    0 references
    approximation spaces
    0 references
    Besov spaces
    0 references
    direct estimates
    0 references
    inverse estimates
    0 references
    piecewise polynomials
    0 references
    ReLU activation function
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references