Best k-layer neural network approximations

From MaRDI portal
Publication:2117342

DOI10.1007/S00365-021-09545-2zbMATH Open1501.41005arXiv1907.01507OpenAlexW3115973547WikidataQ114229768 ScholiaQ114229768MaRDI QIDQ2117342FDOQ2117342

Yang Qi, Mateusz Michalek, Lek-Heng Lim

Publication date: 21 March 2022

Published in: Constructive Approximation (Search for Journal in Brave)

Abstract: We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set s1,dots,sninmathbbRp with corresponding responses t1,dots,tninmathbbRq, fitting a k-layer neural network uheta:mathbbRpomathbbRq involves estimation of the weights hetainmathbbRm via an ERM: [ inf_{ heta in mathbb{R}^m} ; sum_{i=1}^n lVert t_i - u_ heta(s_i) Vert_2^2. ] We show that even for k=2, this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. A high-level explanation is like that for the nonexistence of best rank-r approximations of higher-order tensors --- the set of parameters is not a closed set --- but the geometry involved for best k-layer neural networks approximations is more subtle. In addition, we show that for smooth activations and sigma(x)=anh(x), such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation sigma(x)=max(0,x), we completely classifying cases where the ERM for a best two-layer neural network approximation attains its infimum. As an aside, we obtain a precise description of the geometry of the space of two-layer neural networks with d neurons in the hidden layer: it is the join locus of a line and the d-secant locus of a cone.


Full work available at URL: https://arxiv.org/abs/1907.01507




Recommendations




Cites Work


Cited In (1)

Uses Software





This page was built for publication: Best \(k\)-layer neural network approximations

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2117342)