Analysis of a two-layer neural network via displacement convexity (Q1996787)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Analysis of a two-layer neural network via displacement convexity
scientific article

    Statements

    Analysis of a two-layer neural network via displacement convexity (English)
    0 references
    0 references
    0 references
    0 references
    26 February 2021
    0 references
    This is a contribution in the domain of approximation of functions from given data by means of neural networks. Let $\Omega\subset \mathbb R^d$ be a compact convex set with regular boundary and assume that the given data $\{(y_j,x_j)\}_{j \geq 1}$ are i.i.d. where $x_j\sim\text{Unif}(\Omega)$, $y_j = f(x_j) +\epsilon_j$. The function $f:\Omega\to \mathbb R_{\geq 0}$ is assumed to be concave and smooth. The problem is to fit these data using the output from a neural network, $\hat{f}(x;\omega)=\frac{1}{N}\displaystyle\sum_{i=1}^{N}K^{\delta}(x-{\omega_i})$. Here, \(K\) is a first-order kernel with compact support and $\omega_i$ are the parameters, actually the neural network weights. The authors introduce the risk function $R_N(\omega)=\mathbb{E}{[f(x)-\hat{f}(x;\omega)]^2}$. The objective is to minimize the risk function with respect to the parameters $\omega_i$. They apply the stochastic gradient descent (SGD) method. Using at each step \(k\) the weights ${\omega_i}^k$ and data $x_k$ and $y_k$, one updates the parameters ${\omega_i}^{k+1}$ by a specific law. Under some special assumptions, it is proved that the dynamics of the SGD is well approximated by a partial differential equation with initial and boundary conditions and one shows the existence and uniqueness of a weak solution. The main result of the paper is that the SGD method assures convergence to a model with nearly optimal risk. The fourth section of the paper is devoted to a presentation of some numerical results. Proofs and additional technical details are provided in the supplementary material, see \url{doi:10.1214/20-AOS1945SUPP}.
    0 references
    neural networks
    0 references
    stochastic gradient descent
    0 references
    Wasserstein gradient flow
    0 references
    function regression
    0 references
    convergence rate
    0 references
    displacement convexity
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references