Optimal learning

DOI10.1007/S10092-023-00564-YarXiv2203.15994MaRDI QIDQ6151544FDOQ6151544

Authors: Peter Binev, Andrea Bonito, Ronald DeVore, Guergana Petrova

Publication date: 11 March 2024

Published in: Calcolo (Search for Journal in Brave)

Abstract: This paper studies the problem of learning an unknown function

f

from given data about

f

. The learning problem is to give an approximation

h a t f

to

f

that predicts the values of

f

away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about

f

(known as a model class assumption), (ii) how we measure the accuracy of how well

h a t f

predicts

f

, (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal

h a t f

can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation

h a t f

of the function

f

from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of

f

. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.

Full work available at URL: https://arxiv.org/abs/2203.15994

Recommendations

zbMATH Keywords

regularization Banach space Chebyshev radius optimal learning over-parametrization

Mathematics Subject Classification ID

Linear regression; mixed models (62J05) Learning and adaptive systems in artificial intelligence (68T05) Numerical solutions of ill-posed problems in abstract spaces; regularization (65J20)

Cites Work

Cited In (3)

This page was built for publication: Optimal learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6151544)