Approximation spaces of deep neural networks (Q2117336): Difference between revisions
From MaRDI portal
Changed an Item |
Set profile property. |
||
Property / MaRDI profile type | |||
Property / MaRDI profile type: MaRDI publication profile / rank | |||
Normal rank |
Revision as of 05:56, 5 March 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Approximation spaces of deep neural networks |
scientific article |
Statements
Approximation spaces of deep neural networks (English)
0 references
21 March 2022
0 references
A formal representation of a deep neural network could be conceived as a tuple $\Phi= ((T_1,\alpha _1),\dots, (T_L,\alpha_L))$, where $T_{\mathit{l}}$ are affine-linear maps, $T_{\mathit{l}}(x)=A_{\mathit{l}}x+b_{\mathit{l}}$, $A_{\mathit{l}}$ are matrices and $b_{\mathit{l}}$ vectors, $\alpha_{\mathit{l}}$ are some nonlinearities and \(L\) denotes the number of layers in the network. One defines, as realization of the deep neural network $\Phi$, the function \[\mathcal{R}(\Phi):=\alpha_L \circ T_L\circ \cdots \circ \alpha_1 \circ T_1\] and is implemented by applying the maps layer-wise. The central task of a neural network is in general the approximation of a function \(f\), given a set of training data ($x_i, f(x_i)$. One defines a loss function $\mathcal{L}$, a regulariser $\mathcal{P}$ and the objective is to solve the optimization problem: find a neural network structure $\Phi$, such that $\sum_{i=1}^{m}\mathcal{L}(\mathcal{R}(\Phi)(x_i,f(x_i)) + \lambda\mathcal{P}(\Phi)$ gets minimized. The objective it to achieve a best possible approximation for \(f\). The aim of the article is to introduce and investigate approximation spaces associated with neural networks. One expects that the results will have an impact on domains such as theory of expressivity, statistical analysis of deep learning or design of deep neural networks. The second section of the article is devoted to the definition of neural networks and elementary properties. In the third section one introduces classical approximation spaces the way they are described in Chapter 7 of the book [\textit{R. A. DeVore} and \textit{G. G. Lorentz}, Constructive approximation. Berlin: Springer-Verlag (1993; Zbl 0797.41016)]. By suitable specialization, these spaces are then used in the context of neural networks as neural network approximation spaces. In subsections one concentrates for instance on connectivity versus number of neurons and on relations between approximation classes associated with different depth growth functions. One points out the importance of the choice of the activation function on different approximation spaces. The fourth section is devoted mainly to an investigation on approximation spaces of the ReLU networks. Embeddings between Besov spaces and neural network approximation spaces, direct estimates and inverse estimates, are largely discussed in the fifth section. Additional details and proofs are given in the Appendix A, A1--A11 for Section 2, Appendix B, B1--B4 for Section 3, Appendix C, C1--C4 for Section 4, Appendix D, D1--D5 for Section5 and Appendix E. The appendix part extends over almost the half of the article. References include 69 titles.
0 references
deep neural networks
0 references
sparsely connected networks
0 references
approximation spaces
0 references
Besov spaces
0 references
direct estimates
0 references
inverse estimates
0 references
piecewise polynomials
0 references
ReLU activation function
0 references