Best k-layer neural network approximations
From MaRDI portal
Publication:2117342
Abstract: We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set with corresponding responses , fitting a -layer neural network involves estimation of the weights via an ERM: [ inf_{ heta in mathbb{R}^m} ; sum_{i=1}^n lVert t_i -
u_ heta(s_i)
Vert_2^2. ] We show that even for , this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. A high-level explanation is like that for the nonexistence of best rank- approximations of higher-order tensors --- the set of parameters is not a closed set --- but the geometry involved for best -layer neural networks approximations is more subtle. In addition, we show that for smooth activations and , such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation , we completely classifying cases where the ERM for a best two-layer neural network approximation attains its infimum. As an aside, we obtain a precise description of the geometry of the space of two-layer neural networks with neurons in the hidden layer: it is the join locus of a line and the -secant locus of a cone.
Recommendations
- Critical points for least-squares problems involving certain analytic functions, with applications to sigmoidal nets
- Hardness results for neural network approximation problems
- Over-parametrized deep neural networks minimizing the empirical risk do not generalize well
- Error bounds for approximations with deep ReLU networks
- Optimal approximation of piecewise smooth functions using deep ReLU neural networks
Cites work
- scientific article; zbMATH DE number 5968745 (Why is no real title available?)
- scientific article; zbMATH DE number 436950 (Why is no real title available?)
- Approximation by superpositions of a sigmoidal function
- Complex best \(r\)-term approximations almost always exist in finite dimensions
- Condition. The geometry of numerical algorithms
- Machine learning: from theory to applications. Cooperative research at Siemens and MIT
- Multilayer feedforward networks are universal approximators
- Networks and the best approximation property
- Tensor Rank and the Ill-Posedness of the Best Low-Rank Approximation Problem
- Topological properties of the set of functions generated by neural networks of fixed size
- Training neural networks with noisy data as an ill-posed problem
Cited in
(3)
This page was built for publication: Best \(k\)-layer neural network approximations
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2117342)