Depth separations in neural networks: what is actually being separated? (Q2117335)

From MaRDI portal
Revision as of 03:07, 15 February 2024 by RedirectionBot (talk | contribs) (‎Changed an Item)
scientific article
Language Label Description Also known as
English
Depth separations in neural networks: what is actually being separated?
scientific article

    Statements

    Depth separations in neural networks: what is actually being separated? (English)
    0 references
    0 references
    0 references
    0 references
    21 March 2022
    0 references
    The authors consider approximation properties of depth 2 networks \[ N_2(\mathbf{x})=\sum_{i=1}^wu_i\sigma(\mathbf{w}_i^{\mathsf{T}}\mathbf{x}+b_i). \] The main results are given in three subsections of Section 2. Subsection 2.1 contains a formal result implying that radial functions can be approximated with depth 2, width (parameter \( w \)) poly(\( d \)) (\( \mathbf{x},\mathbf{w}_i\in\mathbb{R}^d \)) networks, to any constant accuracy \( \epsilon. \) This result is proved for networks employing any activation function \( \sigma \) satisfying mild assumption, which implies that the activation can be used to approximate univariate functions well. This assumption is satisfied for all standard activations such as ReLU and sigmoidal functions. In Subsection 2.2 the authors show how Lipschitz radial functions can be approximated by width poly(\( 1/\epsilon) \) depth 2 ReLU networks. Results of Subsection 2.3 complement previous positive approximation results with negative results. Section 3 contains proofs.
    0 references
    deep learning
    0 references
    neural network
    0 references
    approximation theory
    0 references
    depth separation
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references