On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces
From MaRDI portal
(Redirected from Publication:2185697)
Abstract: Deep learning has been applied to various tasks in the field of machine learning and has shown superiority to other common procedures such as kernel methods. To provide a better theoretical understanding of the reasons for its success, we discuss the performance of deep learning and other methods on a nonparametric regression problem with a Gaussian noise. Whereas existing theoretical studies of deep learning have been based mainly on mathematical theories of well-known function classes such as H"{o}lder and Besov classes, we focus on function classes with discontinuity and sparsity, which are those naturally assumed in practice. To highlight the effectiveness of deep learning, we compare deep learning with a class of linear estimators representative of a class of shallow estimators. It is shown that the minimax risk of a linear estimator on the convex hull of a target function class does not differ from that of the original target function class. This results in the suboptimality of linear methods over a simple but non-convex function class, on which deep learning can attain nearly the minimax-optimal rate. In addition to this extreme case, we consider function classes with sparse wavelet coefficients. On these function classes, deep learning also attains the minimax rate up to log factors of the sample size, and linear methods are still suboptimal if the assumed sparsity is strong. We also point out that the parameter sharing of deep neural networks can remarkably reduce the complexity of the model in our setting.
Recommendations
Cites work
- scientific article; zbMATH DE number 3554907 (Why is no real title available?)
- scientific article; zbMATH DE number 2171466 (Why is no real title available?)
- Adaptive Minimax Estimation over Sparse $\ell_q$-Hulls
- Approximation by superpositions of a sigmoidal function
- Concentration inequalities and asymptotic results for ratio type empirical processes
- Deep learning
- Error bounds for approximations with deep ReLU networks
- Ideal spatial adaptation by wavelet shrinkage
- Information-theoretic determination of minimax rates of convergence
- Local Rademacher complexities and oracle inequalities in risk minimization. (2004 IMS Medallion Lecture). (With discussions and rejoinder)
- Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$-Balls
- Minimax estimation of linear functionals over nonconvex parameter spaces.
- Minimax estimation via wavelet shrinkage
- Minimax risk over hyperrectangles, and implications
- Minimax theory of image reconstruction
- Neural network with unbounded activation functions is universal approximator
- Nonparametric regression using deep neural networks with ReLU activation function
- Optimal approximation of piecewise smooth functions using deep ReLU neural networks
- Pattern recognition and machine learning.
- Ten Lectures on Wavelets
- The elements of statistical learning. Data mining, inference, and prediction
- Unconditional bases and bit-level compression
- Unconditional bases are optimal bases for data compression and for statistical estimation
- Wavelet threshold estimation of a regression function with random design
- Weak convergence and empirical processes. With applications to statistics
Cited in
(12)- Localized learning: a possible alternative to current deep learning techniques
- Rejoinder: On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning
- Deep learning theory of distribution regression with CNNs
- Optimal nonparametric inference via deep neural network
- Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space
- Adaptive deep learning for nonlinear time series models
- Consistent Sparse Deep Learning: Theory and Computation
- Learning sparse deep neural networks with a spike-and-slab prior
- scientific article; zbMATH DE number 7387620 (Why is no real title available?)
- Learning sparse and smooth functions by deep sigmoid nets
- Drift estimation for a multi-dimensional diffusion process using deep neural networks
- scientific article; zbMATH DE number 7660136 (Why is no real title available?)
This page was built for publication: On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2185697)