Optimization for deep learning: an overview
From MaRDI portal
Recommendations
- Optimization Landscape of Neural Networks
- Global optimization issues in deep network regression: an overview
- Gradient descent optimizes over-parameterized deep ReLU networks
- Optimization problems for machine learning: a survey
- scientific article; zbMATH DE number 1304093
- Neural networks in optimization
- Optimal deep neural networks by maximization of the approximation power
- scientific article; zbMATH DE number 1786133
- Optimization methods for large-scale machine learning
Cites work
- scientific article; zbMATH DE number 51132 (Why is no real title available?)
- scientific article; zbMATH DE number 1569102 (Why is no real title available?)
- A mean field view of the landscape of two-layer neural networks
- A sensitive-eigenvector based global algorithm for quadratically constrained quadratic programming
- Accelerated methods for nonconvex optimization
- Adaptive restart for accelerated gradient schemes
- Adaptive subgradient methods for online learning and stochastic optimization
- Deep learning
- Effect of depth and width on local minima in deep learning
- Efficiency of coordinate descent methods on huge-scale optimization problems
- Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent
- First-order methods of smooth convex optimization with inexact oracle
- Flat Minima
- Generalization Error in Deep Learning
- Gradient descent optimizes over-parameterized deep ReLU networks
- Katyusha: the first direct acceleration of stochastic gradient methods
- Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization
- Local minima and convergence in low-rank semidefinite programming
- Mean field analysis of neural networks: a central limit theorem
- Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
- Numerical Optimization
- Optimization methods for large-scale machine learning
- Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification
- Randomized methods for linear constraints: convergence rates and conditioning
- Reconciling modern machine-learning practice and the classical bias-variance trade-off
- Reducing the Dimensionality of Data with Neural Networks
- Restart procedures for the conjugate gradient method
- SGD-QN: careful quasi-Newton stochastic gradient descent
- Spurious valleys in one-hidden-layer neural network optimization landscapes
- Step-sizes for the gradient method
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- Two-Point Step Size Gradient Methods
- Why does unsupervised pre-training help deep learning?
Cited in
(13)- Non-convex exact community recovery in stochastic block model
- Linearly constrained nonsmooth optimization for training autoencoders
- Levenberg-Marquardt multi-classification using hinge loss function
- Random-reshuffled SARAH does not need full gradient computations
- Tuning parameters of deep neural network training algorithms pays off: a computational study
- Research on the effect of batch normalization on VGG-like neural networks
- Survey of unstable gradients in deep neural network training
- Initial state reconstruction on graphs
- Training neural networks from an ergodic perspective
- Artificial neural networks with uniform norm-based loss functions
- Research progress on batch normalization of deep learning and its related algorithms
- Why does large batch training result in poor generalization? A comprehensive explanation and a better strategy from the viewpoint of stochastic optimization
- Drift estimation for a multi-dimensional diffusion process using deep neural networks
Describes a project that uses
Uses Software
This page was built for publication: Optimization for deep learning: an overview
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2218095)