Smaller generalization error derived for a deep residual neural network compared with shallow networks

From MaRDI portal
Publication:6190811

DOI10.1093/IMANUM/DRAC049arXiv2010.01887OpenAlexW3153985548MaRDI QIDQ6190811FDOQ6190811

Mattias Sandberg, Anders Szepessy, Jonas Kiessling, Aku Kammonen, R. Tempone, Petr Plecháč

Publication date: 6 February 2024

Published in: IMA Journal of Numerical Analysis (Search for Journal in Brave)

Abstract: Estimates of the generalization error are proved for a residual neural network with L random Fourier features layers . An optimal distribution for the frequencies (omegaellk,omega'ellk) of the random Fourier features and emathrmiomega'ellkcdotx is derived. This derivation is based on the corresponding generalization error for the approximation of the function values f(x). The generalization error turns out to be smaller than the estimate |hatf|L1(mathbbRd)2/(KL) of the generalization error for random Fourier features with one hidden layer and the same total number of nodes KL, in the case the Linfty-norm of f is much less than the L1-norm of its Fourier transform hatf. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.


Full work available at URL: https://arxiv.org/abs/2010.01887











This page was built for publication: Smaller generalization error derived for a deep residual neural network compared with shallow networks

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6190811)