Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness
From MaRDI portal
Publication:2057701
Abstract: The accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization fail to explain the performance of neural networks in practice. To derive a meaningful bound, we study the generalization error of neural networks for classification problems in terms of data distribution and neural network smoothness. We introduce the cover complexity (CC) to measure the difficulty of learning a data set and the inverse of the modulus of continuity to quantify neural network smoothness. A quantitative bound for expected accuracy/error is derived by considering both the CC and neural network smoothness. Although most of the analysis is general and not specific to neural networks, we validate our theoretical assumptions and results numerically for neural networks by several data sets of images. The numerical results confirm that the expected error of trained networks scaled with the square root of the number of classes has a linear relationship with respect to the CC. We also observe a clear consistency between test loss and neural network smoothness during the training process. In addition, we demonstrate empirically that the neural network smoothness decreases when the network size increases whereas the smoothness is insensitive to training dataset size.
Recommendations
- An analysis of training and generalization errors in shallow and deep networks
- Generalization Error Analysis of Neural Networks with Gradient Based Regularization
- Generalization Error in Deep Learning
- High-dimensional dynamics of generalization error in neural networks
- scientific article; zbMATH DE number 7387621
- scientific article; zbMATH DE number 759415
- Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
- Publication:4955300
- Approximation error for neural network operators by an averaged modulus of smoothness
- scientific article; zbMATH DE number 1728675
Cites work
- 10.1162/153244303321897690
- Approximation by superpositions of a sigmoidal function
- Balls in \(\mathbb{R}^k\) do not cut all subsets of \(k+2\) points
- Large-scale machine learning with stochastic gradient descent
- Multilayer feedforward networks are universal approximators
- On the information bottleneck theory of deep learning
- Probability and computing. Randomization and probabilistic techniques in algorithms and data analysis
- Robust Large Margin Deep Neural Networks
- The elements of statistical learning. Data mining, inference, and prediction
- The implicit bias of gradient descent on separable data
Cited in
(11)- Reliable extrapolation of deep neural operators informed by physics or sparse observations
- Approximation capabilities of measure-preserving neural networks
- Applications of finite difference-based physics-informed neural networks to steady incompressible isothermal and thermal flows
- Rademacher complexity and the generalization error of residual networks
- Deep learning architectures for nonlinear operator functions and nonlinear inverse problems
- Physics-informed neural networks with hard constraints for inverse design
- Quantification on the generalization performance of deep neural network with Tychonoff separation axioms
- Mosaic flows: a transferable deep learning framework for solving PDEs on unseen domains
- An analysis of training and generalization errors in shallow and deep networks
- Generalization Error in Deep Learning
- High-dimensional dynamics of generalization error in neural networks
This page was built for publication: Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2057701)