Optimal approximation rate of ReLU networks in terms of width and depth

DOI10.1016/J.MATPUR.2021.07.009MaRDI QIDQ2065073zbMATH OpenOpenAlexFDO

Authors Zuowei Shen, Haizhao Yang, Shijun Zhang

Publication date 7 January 2022

Published in Journal de Mathématiques Pures et Appliquées. Neuvième Série (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/2103.00502

optimal approximation VC-dimension bit extraction deep ReLU networks

Artificial neural networks and deep learning (68T07) Multidimensional problems (41A63) Rate of convergence, degree of approximation (41A25) Approximation by arbitrary nonlinear expressions; widths and entropy (41A46)

Abstract: This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width

and depth

m a t h c a l O (L)

can approximate a H"older continuous function on

[0, 1]^{d}

with an approximation rate

, where

a l p h a i n (0, 1]

and

l a m b d a > 0

are H"older order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function

f

on

[0, 1]^{d}

, the approximation rate becomes

, where

o m e g a_{f} (c d o t)

is the modulus of continuity. We also extend our analysis to any continuous function

f

on a bounded set. Particularly, if ReLU networks with depth

31

and width

m a t h c a l O (N)

are used to approximate one-dimensional Lipschitz continuous functions on

[0, 1]

with a Lipschitz constant

l a m b d a > 0

, the approximation rate in terms of the total number of parameters,

W = m a t h c a l O (N^{2})

, becomes

m a t h c a l O (f r a c l a m b d a W l n W)

, which has not been discovered in the literature for fixed-depth ReLU networks.

Recommendations

Cites work

Cited in

(40)

This page was built for publication: Optimal approximation rate of ReLU networks in terms of width and depth

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2065073)