Optimization for deep learning: an overview
From MaRDI portal
Publication:2218095
DOI10.1007/S40305-020-00309-6zbMATH Open1463.90212OpenAlexW3034315405MaRDI QIDQ2218095FDOQ2218095
Authors: Ruoyu Sun
Publication date: 12 January 2021
Published in: Journal of the Operations Research Society of China (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1007/s40305-020-00309-6
Recommendations
- Optimization Landscape of Neural Networks
- Global optimization issues in deep network regression: an overview
- Gradient descent optimizes over-parameterized deep ReLU networks
- Optimization problems for machine learning: a survey
- scientific article; zbMATH DE number 1304093
- Neural networks in optimization
- Optimal deep neural networks by maximization of the approximation power
- scientific article; zbMATH DE number 1786133
- Optimization methods for large-scale machine learning
Cites Work
- Adaptive subgradient methods for online learning and stochastic optimization
- Numerical Optimization
- Reducing the Dimensionality of Data with Neural Networks
- Adaptive restart for accelerated gradient schemes
- Two-Point Step Size Gradient Methods
- Deep learning
- SGD-QN: careful quasi-Newton stochastic gradient descent
- Restart procedures for the conjugate gradient method
- Why does unsupervised pre-training help deep learning?
- Step-sizes for the gradient method
- Title not available (Why is that?)
- First-order methods of smooth convex optimization with inexact oracle
- Efficiency of coordinate descent methods on huge-scale optimization problems
- Randomized methods for linear constraints: convergence rates and conditioning
- Local minima and convergence in low-rank semidefinite programming
- Title not available (Why is that?)
- Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent
- Accelerated methods for nonconvex optimization
- Optimization methods for large-scale machine learning
- Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
- Mean field analysis of neural networks: a central limit theorem
- A mean field view of the landscape of two-layer neural networks
- Reconciling modern machine-learning practice and the classical bias-variance trade-off
- Generalization Error in Deep Learning
- Flat Minima
- Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification
- Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks
- A sensitive-eigenvector based global algorithm for quadratically constrained quadratic programming
- Gradient descent optimizes over-parameterized deep ReLU networks
- Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization
- Katyusha: the first direct acceleration of stochastic gradient methods
- Effect of depth and width on local minima in deep learning
- Spurious valleys in one-hidden-layer neural network optimization landscapes
Cited In (13)
- Non-convex exact community recovery in stochastic block model
- Linearly constrained nonsmooth optimization for training autoencoders
- Levenberg-Marquardt multi-classification using hinge loss function
- Random-reshuffled SARAH does not need full gradient computations
- Tuning parameters of deep neural network training algorithms pays off: a computational study
- Research on the effect of batch normalization on VGG-like neural networks
- Survey of unstable gradients in deep neural network training
- Initial state reconstruction on graphs
- Training neural networks from an ergodic perspective
- Artificial neural networks with uniform norm-based loss functions
- Research progress on batch normalization of deep learning and its related algorithms
- Why does large batch training result in poor generalization? A comprehensive explanation and a better strategy from the viewpoint of stochastic optimization
- Drift estimation for a multi-dimensional diffusion process using deep neural networks
Uses Software
This page was built for publication: Optimization for deep learning: an overview
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2218095)