Effects of depth, width, and initialization: a convergence analysis of layer-wise training for deep linear neural networks
DOI10.1142/S0219530521500263zbMATH Open1487.68201arXiv1910.05874MaRDI QIDQ5037872FDOQ5037872
Authors: Yeonjong Shin
Publication date: 4 March 2022
Published in: Analysis and Applications (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1910.05874
Recommendations
- Non-convergence of stochastic gradient descent in the training of deep neural networks
- Effect of depth and width on local minima in deep learning
- Mean Field Analysis of Deep Neural Networks
- Mean field analysis of neural networks: a law of large numbers
- Convergence of stochastic gradient descent in deep neural network
Numerical mathematical programming methods (65K05) Artificial neural networks and deep learning (68T07) Applications of mathematical programming (90C90) Numerical linear algebra (65F99) Nonconvex programming, global optimization (90C26)
Cites Work
- A randomized Kaczmarz algorithm with exponential convergence
- Title not available (Why is that?)
- Reducing the Dimensionality of Data with Neural Networks
- A coordinate gradient descent method for nonsmooth separable minimization
- Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization
- Randomized methods for linear constraints: convergence rates and conditioning
- Randomized extended Kaczmarz for solving least squares
- Randomized Kaczmarz solver for noisy linear systems
- Universality of deep convolutional neural networks
- Gradient descent optimizes over-parameterized deep ReLU networks
- Theory of deep convolutional neural networks: downsampling
- Gradient descent with identity initialization efficiently learns positive-definite linear transformations by deep residual networks
Cited In (4)
- Wide neural networks of any depth evolve as linear models under gradient descent *
- Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods
- Research on the effect of batch normalization on VGG-like neural networks
- A convergence analysis of Nesterov's accelerated gradient method in training deep linear neural networks
Uses Software
This page was built for publication: Effects of depth, width, and initialization: a convergence analysis of layer-wise training for deep linear neural networks
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5037872)