Optimization methods for large-scale machine learning
From MaRDI portal
Publication:4641709
machine learningnumerical optimizationsecond-order methodsstochastic gradient methodsnoise reduction methodsalgorithm complexity analysis
Numerical mathematical programming methods (65K05) Learning and adaptive systems in artificial intelligence (68T05) Large-scale problems in mathematical programming (90C06) Analysis of algorithms and problem complexity (68Q25) Nonlinear programming (90C30) Optimization of shapes other than minimal surfaces (49Q10)
Abstract: This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
Recommendations
- On the use of stochastic Hessian information in optimization methods for machine learning
- Stochastic optimization for large-scale machine learning
- Stochastic dual coordinate ascent methods for regularized loss minimization
- Large-scale machine learning with stochastic gradient descent
- scientific article; zbMATH DE number 1786133
Cites work
- scientific article; zbMATH DE number 6377992 (Why is no real title available?)
- scientific article; zbMATH DE number 3643047 (Why is no real title available?)
- scientific article; zbMATH DE number 3850830 (Why is no real title available?)
- scientific article; zbMATH DE number 51132 (Why is no real title available?)
- scientific article; zbMATH DE number 3567782 (Why is no real title available?)
- scientific article; zbMATH DE number 3574917 (Why is no real title available?)
- scientific article; zbMATH DE number 1206370 (Why is no real title available?)
- scientific article; zbMATH DE number 1332320 (Why is no real title available?)
- scientific article; zbMATH DE number 1569102 (Why is no real title available?)
- scientific article; zbMATH DE number 1569104 (Why is no real title available?)
- scientific article; zbMATH DE number 3446442 (Why is no real title available?)
- scientific article; zbMATH DE number 3449561 (Why is no real title available?)
- scientific article; zbMATH DE number 1843019 (Why is no real title available?)
- scientific article; zbMATH DE number 845714 (Why is no real title available?)
- scientific article; zbMATH DE number 852532 (Why is no real title available?)
- scientific article; zbMATH DE number 928863 (Why is no real title available?)
- scientific article; zbMATH DE number 6159604 (Why is no real title available?)
- scientific article; zbMATH DE number 7306852 (Why is no real title available?)
- scientific article; zbMATH DE number 5060482 (Why is no real title available?)
- scientific article; zbMATH DE number 3229153 (Why is no real title available?)
- scientific article; zbMATH DE number 3391741 (Why is no real title available?)
- A Characterization of Superlinear Convergence and Its Application to Quasi-Newton Methods
- A Convergent Incremental Gradient Method with a Constant Step Size
- A Family of Variable-Metric Methods Derived by Variational Means
- A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
- A Stochastic Approximation Method
- A coordinate gradient descent method for nonsmooth separable minimization
- A dual algorithm for the solution of nonlinear variational problems via finite element approximation
- A family of second-order methods for convex \(\ell _1\)-regularized optimization
- A new approach to variable metric algorithms
- A simulation-based approach to two-stage stochastic programming with recourse
- Acceleration of Stochastic Approximation by Averaging
- Adaptive subgradient methods for online learning and stochastic optimization
- Algorithm 778: L-BFGS-B
- An Asynchronous Parallel Stochastic Coordinate Descent Algorithm
- An inexact successive quadratic approximation method for L-1 regularized optimization
- An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
- Conditioning of Quasi-Newton Methods for Function Minimization
- Convex optimization algorithms
- De-noising by soft-thresholding
- Efficiency of coordinate descent methods on huge-scale optimization problems
- Erratum: SGDQN is less careful than expected
- Estimation of dependences based on empirical data. Transl. from the Russian by Samuel Kotz
- Exact matrix completion via convex optimization
- Fisher's Method of Scoring
- Hybrid deterministic-stochastic methods for data fitting
- Incremental Least Squares Methods and the Extended Kalman Filter
- Inexact Newton Methods
- Information-Based Complexity, Feedback and Dynamics in Convex Programming
- Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization
- Introductory lectures on convex optimization. A basic course.
- Iterative Solution of Nonlinear Equations in Several Variables
- LIBLINEAR: a library for large linear classification
- Large-scale machine learning with stochastic gradient descent
- Learning representations by back-propagating errors
- Minimizing finite sums with the stochastic average gradient
- Natural Langevin dynamics for neural networks
- New Classes of Synchronous Codes
- New method of stochastic approximation type
- Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence
- Newton's Method for Large Bound-Constrained Optimization Problems
- On a Stochastic Approximation Method
- On perturbed proximal gradient algorithms
- On sampling rates in simulation-based recursions
- On search directions for minimization algorithms
- On the Convergence Rate of Incremental Aggregated Gradient Algorithms
- On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators
- On the Generalization Ability of On-Line Learning Algorithms
- On the convergence properties of the EM algorithm
- On the limited memory BFGS method for large scale optimization
- On the use of stochastic Hessian information in optimization methods for machine learning
- On‐line learning for very large data sets
- Optimal aggregation of classifiers in statistical learning.
- Optimization for simulation: theory vs. practice
- Optimization with sparsity-inducing penalties
- Pegasos: primal estimated sub-gradient solver for SVM
- Practical inexact proximal quasi-Newton method with global complexity analysis
- Primal-dual subgradient methods for convex problems
- Probability Inequalities for Sums of Bounded Random Variables
- Probability and finance. It's only a game!
- RES: Regularized Stochastic BFGS Algorithm
- Randomized methods for linear constraints: convergence rates and conditioning
- Robust Stochastic Approximation Approach to Stochastic Programming
- SGD-QN: careful quasi-Newton stochastic gradient descent
- Sample size selection in optimization methods for machine learning
- Second-order stochastic optimization for machine learning in linear time
- Some applications of concentration inequalities to statistics
- Some methods of speeding up the convergence of iteration methods
- Sparse Reconstruction by Separable Approximation
- Stochastic dual coordinate ascent methods for regularized loss minimization
- Sub-sampled Newton methods
- Support-vector networks
- Templates for convex cone problems with applications to sparse signal recovery
- The Conjugate Gradient Method and Trust Regions in Large Scale Optimization
- The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations
- The importance of convexity in learning with squared loss
- Trust Region Methods
- Uniform Central Limit Theorems
- Universal Portfolios
- Updating Quasi-Newton Matrices with Limited Storage
Cited in
(only showing first 100 items - show all)- A Bayesian perspective of statistical machine learning for big data
- A distributed optimal control problem with averaged stochastic gradient descent
- A deep learning algorithm for high-dimensional exploratory item factor analysis
- General convergence analysis of stochastic first-order methods for composite optimization
- Adaptive sampling quasi-Newton methods for zeroth-order stochastic optimization
- Adaptive sequential sample average approximation for solving two-stage stochastic linear programs
- Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization
- Stochastic gradient descent with Polyak's learning rate
- scientific article; zbMATH DE number 7306860 (Why is no real title available?)
- A homotopy training algorithm for fully connected neural networks
- Robust Accelerated Primal-Dual Methods for Computing Saddle Points
- New results in cooperative adaptive optimal output regulation
- Non-asymptotic guarantees for sampling by stochastic gradient descent
- Accelerated doubly stochastic gradient descent for tensor CP decomposition
- Adaptive sampling line search for local stochastic optimization with integer variables
- Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning
- Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market
- A stochastic ADMM algorithm for large-scale ptychography with weighted difference of anisotropic and isotropic total variation
- Continuous-time convergence rates in potential and monotone games
- Accelerating incremental gradient optimization with curvature information
- An analysis of stochastic variance reduced gradient for linear inverse problems *
- An inertial Newton algorithm for deep learning
- Bilevel optimization, deep learning and fractional Laplacian regularization with applications in tomography
- On the convergence of a block-coordinate incremental gradient method
- Triangularized orthogonalization-free method for solving extreme eigenvalue problems
- Tackling algorithmic bias in neural-network classifiers using Wasserstein-2 regularization
- On large batch training and sharp minima: a Fokker-Planck perspective
- Mathematical optimization in classification and regression trees
- A discussion on variational analysis in derivative-free optimization
- How can machine learning and optimization help each other better?
- On a multilevel Levenberg-Marquardt method for the training of artificial neural networks and its application to the solution of partial differential equations
- Survey on parallel and distributed optimization algorithms for scalable machine learning
- Block layer decomposition schemes for training deep neural networks
- Stochastic quasi-subgradient method for stochastic quasi-convex feasibility problems
- Warped Riemannian metrics for location-scale models
- An accelerated variance reducing stochastic method with Douglas-Rachford splitting
- On mathematical optimization for clustering categories in contingency tables
- Stochastic linear regularization methods: random discrepancy principle and applications
- A trust region method for noisy unconstrained optimization
- A hybrid BB-type method for solving large scale unconstrained optimization
- slimTrain---A Stochastic Approximation Method for Training Separable Deep Neural Networks
- Principled deep neural network training through linear programming
- Stochastic gradient descent for semilinear elliptic equations with uncertainties
- A stochastic Levenberg-Marquardt method using random models with complexity results
- Block-wise primal-dual algorithms for large-scale doubly penalized ANOVA modeling
- Fully stochastic trust-region sequential quadratic programming for equality-constrained optimization problems
- Quasi-convex feasibility problems: subgradient methods and convergence rates
- scientific article; zbMATH DE number 7415096 (Why is no real title available?)
- ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels
- Parallel optimization techniques for machine learning
- Stochastic sampling for deterministic structural topology optimization with many load cases: density-based and ground structure approaches
- Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games. II: The finite horizon case
- A nested primal-dual FISTA-like scheme for composite convex optimization problems
- A stochastic first-order trust-region method with inexact restoration for finite-sum minimization
- A deep domain decomposition method based on Fourier features
- Inertial accelerated SGD algorithms for solving large-scale lower-rank tensor CP decomposition problems
- Generating Nesterov's accelerated gradient algorithm by using optimal control theory for optimization
- A stochastic variance reduced gradient method with adaptive step for stochastic optimization
- Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection
- MgNet: a unified framework of multigrid and convolutional neural network
- Efficient and sparse neural networks by pruning weights in a multiobjective learning approach
- Adaptive deep density approximation for Fokker-Planck equations
- Hessian averaging in stochastic Newton methods achieves superlinear convergence
- A distributed conjugate gradient online learning method over networks
- A stochastic gradient method for a class of nonlinear PDE-constrained optimal control problems under uncertainty
- On the Convergence of Stochastic Gradient Descent for Nonlinear Ill-Posed Problems
- On the regularizing property of stochastic gradient descent
- A mini-batch proximal stochastic recursive gradient algorithm with diagonal Barzilai-Borwein stepsize
- Adaptive sampling stochastic multigradient algorithm for stochastic multiobjective optimization
- Levenberg-Marquardt revisited and parameter tuning of river regression models
- Convergence of online mirror descent
- Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness
- Risk-Sensitive Reinforcement Learning via Policy Gradient Search
- Three ways to solve partial differential equations with neural networks — A review
- Distributed nonconvex constrained optimization over time-varying digraphs
- Exploiting negative curvature in deterministic and stochastic optimization
- A Levenberg-Marquardt method for large nonlinear least-squares problems with dynamic accuracy in functions and gradients
- On principal components regression, random projections, and column subsampling
- Partial differential equation regularization for supervised machine learning
- scientific article; zbMATH DE number 7307490 (Why is no real title available?)
- On variance reduction for stochastic smooth convex optimization with multiplicative noise
- A Deep Learning Method for Elliptic Hemivariational Inequalities
- Stochastic three-term conjugate gradient method with variance technique for non-convex learning
- Non-diffusive neural network method for hyperbolic conservation laws
- High probability complexity bounds for adaptive step search based on stochastic oracles
- A decentralized Nesterov gradient method for stochastic optimization over unbalanced directed networks
- A new stochastic gradient descent possibilistic clustering algorithm
- First-Order Methods for Nonconvex Quadratic Minimization
- On the complexity of a stochastic Levenberg-Marquardt method
- Convergence Analysis of a Quasi-Monte CarloBased Deep Learning Algorithm for Solving Partial Differential Equations
- A Convergence Study of SGD-Type Methods for Stochastic Optimization
- Sampled limited memory methods for massive linear inverse problems
- scientific article; zbMATH DE number 7306906 (Why is no real title available?)
- Multi-agent natural actor-critic reinforcement learning algorithms
- Stochastic momentum methods for non-convex learning without bounded assumptions
- Stochastic quasi-Newton with line-search regularisation
- Quasi-Newton methods for machine learning: forget the past, just sample
- Data science vs. statistics: two cultures?
- Optimal randomized classification trees
- First-order methods for convex optimization
Describes a project that uses
Uses Software
This page was built for publication: Optimization methods for large-scale machine learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4641709)