Analysis of a two-layer neural network via displacement convexity (Q1996787)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Analysis of a two-layer neural network via displacement convexity |
scientific article |
Statements
Analysis of a two-layer neural network via displacement convexity (English)
0 references
26 February 2021
0 references
This is a contribution in the domain of approximation of functions from given data by means of neural networks. Let $\Omega\subset \mathbb R^d$ be a compact convex set with regular boundary and assume that the given data $\{(y_j,x_j)\}_{j \geq 1}$ are i.i.d. where $x_j\sim\text{Unif}(\Omega)$, $y_j = f(x_j) +\epsilon_j$. The function $f:\Omega\to \mathbb R_{\geq 0}$ is assumed to be concave and smooth. The problem is to fit these data using the output from a neural network, $\hat{f}(x;\omega)=\frac{1}{N}\displaystyle\sum_{i=1}^{N}K^{\delta}(x-{\omega_i})$. Here, \(K\) is a first-order kernel with compact support and $\omega_i$ are the parameters, actually the neural network weights. The authors introduce the risk function $R_N(\omega)=\mathbb{E}{[f(x)-\hat{f}(x;\omega)]^2}$. The objective is to minimize the risk function with respect to the parameters $\omega_i$. They apply the stochastic gradient descent (SGD) method. Using at each step \(k\) the weights ${\omega_i}^k$ and data $x_k$ and $y_k$, one updates the parameters ${\omega_i}^{k+1}$ by a specific law. Under some special assumptions, it is proved that the dynamics of the SGD is well approximated by a partial differential equation with initial and boundary conditions and one shows the existence and uniqueness of a weak solution. The main result of the paper is that the SGD method assures convergence to a model with nearly optimal risk. The fourth section of the paper is devoted to a presentation of some numerical results. Proofs and additional technical details are provided in the supplementary material, see \url{doi:10.1214/20-AOS1945SUPP}.
0 references
neural networks
0 references
stochastic gradient descent
0 references
Wasserstein gradient flow
0 references
function regression
0 references
convergence rate
0 references
displacement convexity
0 references
0 references
0 references
0 references
0 references