Full error analysis for the training of deep neural networks
From MaRDI portal
Publication:5083408
Abstract: Deep learning algorithms have been applied very successfully in recent years to a range of problems out of reach for classical solution paradigms. Nevertheless, there is no completely rigorous mathematical error and convergence analysis which explains the success of deep learning algorithms. The error of a deep learning algorithm can in many situations be decomposed into three parts, the approximation error, the generalization error, and the optimization error. In this work we estimate for a certain deep learning algorithm each of these three errors and combine these three error estimates to obtain an overall error analysis for the deep learning algorithm under consideration. In particular, we thereby establish convergence with a suitable convergence speed for the overall error of the deep learning algorithm under consideration. Our convergence speed analysis is far from optimal and the convergence speed that we establish is rather slow, increases exponentially in the dimensions, and, in particular, suffers from the curse of dimensionality. The main contribution of this work is, instead, to provide a full error analysis (i) which covers each of the three different sources of errors usually emerging in deep learning algorithms and (ii) which merges these three sources of errors into one overall error estimate for the considered deep learning algorithm.
Recommendations
Cites work
- scientific article; zbMATH DE number 5730451 (Why is no real title available?)
- scientific article; zbMATH DE number 671791 (Why is no real title available?)
- scientific article; zbMATH DE number 1083116 (Why is no real title available?)
- scientific article; zbMATH DE number 1405266 (Why is no real title available?)
- scientific article; zbMATH DE number 1420699 (Why is no real title available?)
- A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics
- A distribution-free theory of nonparametric regression
- A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
- A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations
- A theoretical analysis of deep neural networks and parametric PDEs
- Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations
- Approximation and estimation bounds for artificial neural networks
- Approximation by superpositions of a sigmoidal function
- Approximation of functions and their derivatives: A neural network implementation with applications
- Approximation spaces of deep neural networks
- Breaking the curse of dimensionality with convex neural networks
- Convergence rates for the stochastic gradient descent method for non-convex objective functions
- DNN expression rate analysis of high-dimensional PDEs: application to option pricing
- Deep Neural Network Approximation Theory
- Deep learning
- Deep learning in high dimension: neural network expression rates for generalized polynomial chaos expansions in UQ
- Deep network approximation characterized by number of neurons
- Deep neural networks algorithms for stochastic control problems on finite horizon: convergence analysis
- Deep vs. shallow networks: an approximation theory perspective
- Degree of approximation by neural and translation networks with a single hidden layer
- Equivalence of approximation by convolutional neural networks and fully-connected networks
- Error bounds for approximation with neural networks
- Error bounds for approximations with deep ReLU networks
- Error bounds for approximations with deep ReLU neural networks in \(W^{s , p}\) norms
- Exponential convergence of the deep neural network approximation for analytic functions
- General multilevel adaptations for stochastic approximation algorithms of Robbins-Monro and Polyak-Ruppert type
- Gradient descent optimizes over-parameterized deep ReLU networks
- Local Rademacher complexities
- Lower error bounds for the stochastic gradient descent optimization algorithm: sharp convergence rates for slowly and fast decaying learning rates
- Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations
- Multilayer feedforward networks are universal approximators
- Neural Networks for Localized Approximation
- Nonlinear approximation via compositions
- On Stochastic Gradient Langevin Dynamics with Dependent Data Streams: The Fully Nonconvex Case
- On the approximation by single hidden layer feedforward neural networks with fixed weights
- On the mathematical foundations of learning
- Optimal approximation of piecewise smooth functions using deep ReLU neural networks
- Optimal approximation with sparsely connected deep neural networks
- Probability Inequalities for Sums of Bounded Random Variables
- Proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients
- Provable approximation properties for deep neural networks
- Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems
- Sets of finite perimeter and geometric variational problems. An introduction to geometric measure theory
- Solving the Kolmogorov PDE by means of deep learning
- Strong error analysis for stochastic gradient descent optimization algorithms
- Topological properties of the set of functions generated by neural networks of fixed size
- Tractability of multivariate problems. Volume I: Linear information
- Tractability of multivariate problems. Volume II: Standard information for functionals.
- Understanding machine learning. From theory to algorithms
- Universal approximation bounds for superpositions of a sigmoidal function
- Universal approximations of invariant maps by neural networks
Cited in
(13)- An analytic layer-wise deep learning framework with applications to robotics
- Lower bounds for artificial neural network approximations: a proof that shallow neural networks fail to overcome the curse of dimensionality
- Error Analysis and Improving the Accuracy of Winograd Convolution for Deep Neural Networks
- Error analysis for deep neural network approximations of parametric hyperbolic conservation laws
- A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
- Strong overall error analysis for the training of artificial neural networks via random initializations
- Numerical analysis of physics-informed neural networks and related models in physics-informed machine learning
- Deep learning based on randomized quasi-Monte Carlo method for solving linear Kolmogorov partial differential equation
- Learning the random variables in Monte Carlo simulations with stochastic gradient descent: Machine learning for parametric PDEs and financial derivative pricing
- Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
- An analysis of training and generalization errors in shallow and deep networks
- Error analysis for empirical risk minimization over clipped ReLU networks in solving linear Kolmogorov partial differential equations
- A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
This page was built for publication: Full error analysis for the training of deep neural networks
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5083408)