Learning the mapping x _i=1d x_i^2: the cost of finding the needle in a haystack

DOI10.1007/S42967-020-00078-2zbMATH Open1476.68242arXiv2002.10561OpenAlexW3047755916MaRDI QIDQ2667355FDOQ2667355

Publication date: 24 November 2021

Published in: Communications on Applied Mathematics and Computation (Search for Journal in Brave)

Abstract: The task of using machine learning to approximate the mapping

m a t h b f x m a p s t o s u m_{i = 1}^{d} x_{i}^{2}

with

x_{i} i n [- 1, 1]

seems to be a trivial one. Given the knowledge of the separable structure of the function, one can design a sparse network to represent the function very accurately, or even exactly. When such structural information is not available, and we may only use a dense neural network, the optimization procedure to find the sparse network embedded in the dense network is similar to finding the needle in a haystack, using a given number of samples of the function. We demonstrate that the cost (measured by sample complexity) of finding the needle is directly related to the Barron norm of the function. While only a small number of samples is needed to train a sparse network, the dense network trained with the same number of samples exhibits large test loss and a large generalization gap. In order to control the size of the generalization gap, we find that the use of explicit regularization becomes increasingly more important as

d

increases. The numerically observed sample complexity with explicit regularization scales as

m a t h c a l O (d^{2.5})

, which is in fact better than the theoretically predicted sample complexity that scales as

m a t h c a l O (d^{4})

. Without explicit regularization (also called implicit regularization), the numerically observed sample complexity is significantly higher and is close to

m a t h c a l O (d^{4.5})

.

Full work available at URL: https://arxiv.org/abs/2002.10561

Recommendations

zbMATH Keywords

machine learning generalization error sample complexity

Mathematics Subject Classification ID

Statistical aspects of big data and data science (62R07) Learning and adaptive systems in artificial intelligence (68T05) Numerical summation of series (65B10) Neural nets and related approaches to inference from stochastic processes (62M45)

Cites Work

Cited In (2)

Uses Software

This page was built for publication: Learning the mapping \(\mathbf{x}\mapsto \sum\limits_{i=1}^d x_i^2\): the cost of finding the needle in a haystack

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2667355)