Best k-layer neural network approximations

DOI10.1007/S00365-021-09545-2zbMATH Open1501.41005arXiv1907.01507OpenAlexW3115973547WikidataQ114229768 ScholiaQ114229768MaRDI QIDQ2117342FDOQ2117342

Authors: Lek-Heng Lim, Mateusz Michalek, Yang Qi

Publication date: 21 March 2022

Published in: Constructive Approximation (Search for Journal in Brave)

Abstract: We show that the empirical risk minimization (ERM) problem for neural networks has no solution in general. Given a training set

s_{1}, d o t s, s_{n} i n m a t h b b R^{p}

with corresponding responses

t_{1}, d o t s, t_{n} i n m a t h b b R^{q}

, fitting a

k

-layer neural network

u_{h} e t a : m a t h b b R^{p} o m a t h b b R^{q}

involves estimation of the weights

h e t a i n m a t h b b R^{m}

via an ERM: [ inf_{ heta in mathbb{R}^m} ; sum_{i=1}^n lVert t_i - u_ heta(s_i) Vert_2^2. ] We show that even for

k = 2

, this infimum is not attainable in general for common activations like ReLU, hyperbolic tangent, and sigmoid functions. A high-level explanation is like that for the nonexistence of best rank-

r

approximations of higher-order tensors --- the set of parameters is not a closed set --- but the geometry involved for best

k

-layer neural networks approximations is more subtle. In addition, we show that for smooth activations

and

s i g m a (x) = a n h (x)

, such failure to attain an infimum can happen on a positive-measured subset of responses. For the ReLU activation

s i g m a (x) = m a x (0, x)

, we completely classifying cases where the ERM for a best two-layer neural network approximation attains its infimum. As an aside, we obtain a precise description of the geometry of the space of two-layer neural networks with

d

neurons in the hidden layer: it is the join locus of a line and the

d

-secant locus of a cone.

Full work available at URL: https://arxiv.org/abs/1907.01507

Recommendations

zbMATH Keywords

best approximation neural network secant loci join loci

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Neural networks for/in biological studies, artificial life and related topics (92B20) Approximation by other special function classes (41A30) Best approximation, Chebyshev systems (41A50)

Cites Work

Cited In (3)

Uses Software

This page was built for publication: Best \(k\)-layer neural network approximations

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2117342)