Optimal learning
From MaRDI portal
Publication:6151544
DOI10.1007/S10092-023-00564-YarXiv2203.15994MaRDI QIDQ6151544FDOQ6151544
Authors: Peter Binev, Andrea Bonito, Ronald DeVore, Guergana Petrova
Publication date: 11 March 2024
Published in: Calcolo (Search for Journal in Brave)
Abstract: This paper studies the problem of learning an unknown function from given data about . The learning problem is to give an approximation to that predicts the values of away from the data. There are numerous settings for this learning problem depending on (i) what additional information we have about (known as a model class assumption), (ii) how we measure the accuracy of how well predicts , (iii) what is known about the data and data sites, (iv) whether the data observations are polluted by noise. A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, it is shown in this paper that a near optimal can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this paper prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation of the function from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of . An extension of these results to the case where the data is polluted by additive deterministic noise is also given.
Full work available at URL: https://arxiv.org/abs/2203.15994
Recommendations
Linear regression; mixed models (62J05) Learning and adaptive systems in artificial intelligence (68T05) Numerical solutions of ill-posed problems in abstract spaces; regularization (65J20)
Cites Work
- Title not available (Why is that?)
- Lasso-type recovery of sparse representations for high-dimensional data
- Simultaneous analysis of Lasso and Dantzig selector
- Square-root lasso: pivotal recovery of sparse signals via conic programming
- Universal approximation bounds for superpositions of a sigmoidal function
- Estimation and testing under sparsity. École d'Été de Probabilités de Saint-Flour XLV -- 2015
- Title not available (Why is that?)
- The Group Square-Root Lasso: Theoretical Properties and Fast Algorithms
- Compressed sensing
- Compressed sensing and best \(k\)-term approximation
- A mathematical introduction to compressive sensing
- Minimization of functions having Lipschitz continuous first partial derivatives
- Tractability of multivariate problems. Volume I: Linear information
- On the stability and accuracy of least squares approximations
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Sobolev bounds on functions with scattered zeros, with applications to radial basis function surface fitting
- Sparse Polynomial Approximation of High-Dimensional Functions
- Neural network approximation
- The sparsity of LASSO-type minimizers
- Data assimilation and sampling in Banach spaces
- Function values are enough for \(L_2\)-approximation
- Correcting for unknown errors in sparse high-dimensional function approximation
- Best subset, forward stepwise or Lasso? Analysis and recommendations based on extensive comparisons
- A new upper bound for sampling numbers
- Banach space representer theorems for neural networks and ridge splines
- A unifying representer theorem for inverse problems and machine learning
- Robust instance-optimal recovery of sparse signals at unknown noise levels
Cited In (3)
This page was built for publication: Optimal learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6151544)