Linearized two-layers neural networks in high dimension (Q2039801): Difference between revisions

The authors study nonparametric regression problems for univariate responses \(y_1, \ldots, y_n\) and \(\mathbb{R}^d\)-valued feature vectors \(\mathbf{x}_1, \hdots, \mathbf{x}_n\), where the tuples \((y_i, \mathbf{x}_i)_{1 \leq i \leq n}\) are assumed to be stochastically independent and identically distributed. Their goal is to construct a function \(f: \mathbb{R}^d \to \mathbb{R}\) which predicts future responses. The quality of such an \(f\) is assessed via its square prediction risk. In particular, the authors consider choosing \(f\) from the class \(\mathcal{F}_{\text{NN}}\) of two-layer neural networks. An approximation (based on a first-order Taylor expansion) of \(f \in \mathcal{F}_{\text{NN}}\) by a part belonging to a random features model and a part belonging to a neural tangent class is studied. The approximation errors of both parts are analyzed under different asymptotic regimes in which \(n\) and/or \(d\) tend to infinity. Furthermore, the generalization error of certain kernel methods is analyzed. Besides these theoretical contributions, the authors also present some numerical results.

0 references

reviewed by

Thorsten Dickhaus

0 references

zbMATH Keywords

approximation bounds

0 references

kernel ridge regression

0 references

neural tangent class

0 references

random features

0 references

MaRDI profile type

Publication

0 references

cites work

Neural Network Learning

0 references

On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions

0 references

Breaking the Curse of Dimensionality with Convex Neural Networks

0 references

Universal approximation bounds for superpositions of a sigmoidal function

0 references

Reconciling modern machine-learning practice and the classical bias–variance trade-off

0 references

Q5488485

0 references

Optimal rates for the regularized least-squares algorithm

0 references

Q2755103

0 references

Approximation by superpositions of a sigmoidal function

0 references

Optimal nonlinear approximation

0 references

Projection-based approximation and a duality with kernel methods

0 references

Spherical Harmonics in p Dimensions

0 references

The spectrum of kernel random matrices

0 references

On information plus noise kernel random matrices

0 references

Disentangling feature and lazy training in deep neural networks

0 references

A distribution-free theory of nonparametric regression

0 references

Wide neural networks of any depth evolve as linear models under gradient descent <sup>*</sup>

0 references

Just interpolate: kernel ``ridgeless'' regression can generalize

0 references

On best approximation by ridge functions

0 references

On the near optimality of the stochastic approximation of smooth functions by neural networks

0 references

The landscape of empirical risk for nonconvex losses

0 references

A mean field view of the landscape of two-layer neural networks

0 references

Dimension-independent bounds on the degree of approximation by neural networks

0 references

Bayesian learning for neural networks

0 references

Approximation by Ridge Functions and Neural Networks

0 references

Q4938227

0 references

Mean field analysis of neural networks: a central limit theorem

0 references

Q4614113

0 references

Introduction to nonparametric estimation

0 references

Gradient descent optimizes over-parameterized deep ReLU networks

0 references

Identifiers

zbMATH Open document ID

1473.62134

0 references

DOI

10.1214/20-AOS1990

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2039801

@@ Property / cites work @@
+Neural Network Learning
@@ Property / cites work: Neural Network Learning / rank @@
+Normal rank
@@ Property / cites work @@
+On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions
+Normal rank
@@ Property / cites work @@
+Breaking the Curse of Dimensionality with Convex Neural Networks
+Normal rank
@@ Property / cites work @@
+Universal approximation bounds for superpositions of a sigmoidal function
+Normal rank
@@ Property / cites work @@
+Reconciling modern machine-learning practice and the classical bias–variance trade-off
+Normal rank
@@ Property / cites work @@
+Q5488485
@@ Property / cites work: Q5488485 / rank @@
+Normal rank
@@ Property / cites work @@
+Optimal rates for the regularized least-squares algorithm
+Normal rank
@@ Property / cites work @@
+Q2755103
@@ Property / cites work: Q2755103 / rank @@
+Normal rank
@@ Property / cites work @@
+Approximation by superpositions of a sigmoidal function
+Normal rank
@@ Property / cites work @@
+Optimal nonlinear approximation
@@ Property / cites work: Optimal nonlinear approximation / rank @@
+Normal rank
@@ Property / cites work @@
+Projection-based approximation and a duality with kernel methods
+Normal rank
@@ Property / cites work @@
+Spherical Harmonics in p Dimensions
@@ Property / cites work: Spherical Harmonics in p Dimensions / rank @@
+Normal rank
@@ Property / cites work @@
+The spectrum of kernel random matrices
@@ Property / cites work: The spectrum of kernel random matrices / rank @@
+Normal rank
@@ Property / cites work @@
+On information plus noise kernel random matrices
@@ Property / cites work: On information plus noise kernel random matrices / rank @@
+Normal rank
@@ Property / cites work @@
+Disentangling feature and lazy training in deep neural networks
+Normal rank
@@ Property / cites work @@
+A distribution-free theory of nonparametric regression
+Normal rank
@@ Property / cites work @@
+Wide neural networks of any depth evolve as linear models under gradient descent                   <sup>*</sup>
+Normal rank
@@ Property / cites work @@
+Just interpolate: kernel ``ridgeless'' regression can generalize
+Normal rank
@@ Property / cites work @@
+On best approximation by ridge functions
@@ Property / cites work: On best approximation by ridge functions / rank @@
+Normal rank
@@ Property / cites work @@
+On the near optimality of the stochastic approximation of smooth functions by neural networks
+Normal rank
@@ Property / cites work @@
+The landscape of empirical risk for nonconvex losses
+Normal rank
@@ Property / cites work @@
+A mean field view of the landscape of two-layer neural networks
+Normal rank
@@ Property / cites work @@
+Dimension-independent bounds on the degree of approximation by neural networks
+Normal rank
@@ Property / cites work @@
+Bayesian learning for neural networks
@@ Property / cites work: Bayesian learning for neural networks / rank @@
+Normal rank
@@ Property / cites work @@
+Approximation by Ridge Functions and Neural Networks
+Normal rank
@@ Property / cites work @@
+Q4938227
@@ Property / cites work: Q4938227 / rank @@
+Normal rank
@@ Property / cites work @@
+Mean field analysis of neural networks: a central limit theorem
+Normal rank
@@ Property / cites work @@
+Q4614113
@@ Property / cites work: Q4614113 / rank @@
+Normal rank
@@ Property / cites work @@
+Introduction to nonparametric estimation
@@ Property / cites work: Introduction to nonparametric estimation / rank @@
+Normal rank
@@ Property / cites work @@
+Gradient descent optimizes over-parameterized deep ReLU networks
+Normal rank