Smaller generalization error derived for a deep residual neural network compared with shallow networks

DOI10.1093/IMANUM/DRAC049arXiv2010.01887OpenAlexW3153985548MaRDI QIDQ6190811FDOQ6190811

Authors: Aku Kammonen, Jonas Kiessling, Petr Plecháč, Mattias Sandberg, Anders Szepessy, R. Tempone

Publication date: 6 February 2024

Published in: IMA Journal of Numerical Analysis (Search for Journal in Brave)

Abstract: Estimates of the generalization error are proved for a residual neural network with

L

random Fourier features layers

. An optimal distribution for the frequencies

(o m e g a_{e l l k}, o m e g a'_{e l l k})

of the random Fourier features

and

e^{m a t h r m i o m e g a'_{e l l k} c d o t x}

is derived. This derivation is based on the corresponding generalization error for the approximation of the function values

f (x)

. The generalization error turns out to be smaller than the estimate

| h a t f |_{L^{1} (m a t h b b R^{d})}^{2} / (K L)

of the generalization error for random Fourier features with one hidden layer and the same total number of nodes

K L

, in the case the

L^{i} n f t y

-norm of

f

is much less than the

L^{1}

-norm of its Fourier transform

h a t f

. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network. Promising performance of the proposed new algorithm is demonstrated in computational experiments.

Full work available at URL: https://arxiv.org/abs/2010.01887

Recommendations

zbMATH Keywords

supervised learning error estimates residual network deep random feature networks layer-by-layer algorithm

Mathematics Subject Classification ID

Artificial neural networks and deep learning (68T07) Randomized algorithms (68W20)

This page was built for publication: Smaller generalization error derived for a deep residual neural network compared with shallow networks

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6190811)