Optimization methods for large-scale machine learning
DOI10.1137/16M1080173zbMATH Open1397.65085arXiv1606.04838OpenAlexW2963433607WikidataQ89144557 ScholiaQ89144557MaRDI QIDQ4641709FDOQ4641709
Authors: Léon Bottou, Jorge Nocedal, Frank E. Curtis
Publication date: 18 May 2018
Published in: SIAM Review (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1606.04838
Recommendations
- On the use of stochastic Hessian information in optimization methods for machine learning
- Stochastic optimization for large-scale machine learning
- Stochastic dual coordinate ascent methods for regularized loss minimization
- Large-scale machine learning with stochastic gradient descent
- scientific article; zbMATH DE number 1786133
machine learningnumerical optimizationsecond-order methodsstochastic gradient methodsnoise reduction methodsalgorithm complexity analysis
Numerical mathematical programming methods (65K05) Learning and adaptive systems in artificial intelligence (68T05) Large-scale problems in mathematical programming (90C06) Analysis of algorithms and problem complexity (68Q25) Nonlinear programming (90C30) Optimization of shapes other than minimal surfaces (49Q10)
Cites Work
- LIBLINEAR: a library for large linear classification
- Algorithm 778: L-BFGS-B
- Newton's Method for Large Bound-Constrained Optimization Problems
- A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
- Templates for convex cone problems with applications to sparse signal recovery
- Pegasos: primal estimated sub-gradient solver for SVM
- Adaptive subgradient methods for online learning and stochastic optimization
- Title not available (Why is that?)
- Title not available (Why is that?)
- On the convergence properties of the EM algorithm
- A dual algorithm for the solution of nonlinear variational problems via finite element approximation
- Support-vector networks
- Title not available (Why is that?)
- Title not available (Why is that?)
- On the limited memory BFGS method for large scale optimization
- On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators
- Introductory lectures on convex optimization. A basic course.
- Title not available (Why is that?)
- Title not available (Why is that?)
- Updating Quasi-Newton Matrices with Limited Storage
- Acceleration of Stochastic Approximation by Averaging
- Title not available (Why is that?)
- Probability Inequalities for Sums of Bounded Random Variables
- Learning representations by back-propagating errors
- A Family of Variable-Metric Methods Derived by Variational Means
- A new approach to variable metric algorithms
- Conditioning of Quasi-Newton Methods for Function Minimization
- A Stochastic Approximation Method
- Exact matrix completion via convex optimization
- Primal-dual subgradient methods for convex problems
- SGD-QN: careful quasi-Newton stochastic gradient descent
- On the use of stochastic Hessian information in optimization methods for machine learning
- Robust Stochastic Approximation Approach to Stochastic Programming
- Title not available (Why is that?)
- RES: Regularized Stochastic BFGS Algorithm
- Title not available (Why is that?)
- Sample size selection in optimization methods for machine learning
- A coordinate gradient descent method for nonsmooth separable minimization
- Uniform Central Limit Theorems
- De-noising by soft-thresholding
- Sparse Reconstruction by Separable Approximation
- Some applications of concentration inequalities to statistics
- Optimization with sparsity-inducing penalties
- Title not available (Why is that?)
- Trust Region Methods
- An iterative thresholding algorithm for linear inverse problems with a sparsity constraint
- Optimization for simulation: theory vs. practice
- Efficiency of coordinate descent methods on huge-scale optimization problems
- A family of second-order methods for convex \(\ell _1\)-regularized optimization
- Randomized methods for linear constraints: convergence rates and conditioning
- Title not available (Why is that?)
- Iterative Solution of Nonlinear Equations in Several Variables
- A Convergent Incremental Gradient Method with a Constant Step Size
- An Asynchronous Parallel Stochastic Coordinate Descent Algorithm
- The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations
- The Conjugate Gradient Method and Trust Regions in Large Scale Optimization
- Inexact Newton Methods
- Title not available (Why is that?)
- Title not available (Why is that?)
- Estimation of dependences based on empirical data. Transl. from the Russian by Samuel Kotz
- Optimal aggregation of classifiers in statistical learning.
- Probability and finance. It's only a game!
- Universal Portfolios
- The importance of convexity in learning with squared loss
- Title not available (Why is that?)
- Stochastic dual coordinate ascent methods for regularized loss minimization
- On search directions for minimization algorithms
- A simulation-based approach to two-stage stochastic programming with recourse
- On perturbed proximal gradient algorithms
- On‐line learning for very large data sets
- On the Generalization Ability of On-Line Learning Algorithms
- Title not available (Why is that?)
- Incremental Least Squares Methods and the Extended Kalman Filter
- An inexact successive quadratic approximation method for L-1 regularized optimization
- Title not available (Why is that?)
- New Classes of Synchronous Codes
- On a Stochastic Approximation Method
- Practical inexact proximal quasi-Newton method with global complexity analysis
- Some methods of speeding up the convergence of iteration methods
- A Characterization of Superlinear Convergence and Its Application to Quasi-Newton Methods
- Large-scale machine learning with stochastic gradient descent
- Fisher's Method of Scoring
- Convex optimization algorithms
- Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization
- Information-Based Complexity, Feedback and Dynamics in Convex Programming
- Hybrid deterministic-stochastic methods for data fitting
- Minimizing finite sums with the stochastic average gradient
- New method of stochastic approximation type
- Title not available (Why is that?)
- Title not available (Why is that?)
- On the Convergence Rate of Incremental Aggregated Gradient Algorithms
- Title not available (Why is that?)
- On sampling rates in simulation-based recursions
- Natural Langevin dynamics for neural networks
- Title not available (Why is that?)
- Sub-sampled Newton methods
- Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence
- Second-order stochastic optimization for machine learning in linear time
- Erratum: SGDQN is less careful than expected
- Title not available (Why is that?)
Cited In (only showing first 100 items - show all)
- An inertial Newton algorithm for deep learning
- Adaptive sampling line search for local stochastic optimization with integer variables
- Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning
- On the convergence of a block-coordinate incremental gradient method
- A trust region method for noisy unconstrained optimization
- Triangularized orthogonalization-free method for solving extreme eigenvalue problems
- Tackling algorithmic bias in neural-network classifiers using Wasserstein-2 regularization
- Block layer decomposition schemes for training deep neural networks
- Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games. II: The finite horizon case
- A nested primal-dual FISTA-like scheme for composite convex optimization problems
- A stochastic first-order trust-region method with inexact restoration for finite-sum minimization
- A deep domain decomposition method based on Fourier features
- Inertial accelerated SGD algorithms for solving large-scale lower-rank tensor CP decomposition problems
- Generating Nesterov's accelerated gradient algorithm by using optimal control theory for optimization
- Three ways to solve partial differential equations with neural networks — A review
- Efficient and sparse neural networks by pruning weights in a multiobjective learning approach
- Adaptive deep density approximation for Fokker-Planck equations
- First-Order Methods for Nonconvex Quadratic Minimization
- Stochastic quasi-Newton with line-search regularisation
- Optimal randomized classification trees
- Adaptive two-layer ReLU neural network. I: Best least-squares approximation
- Adaptive two-layer ReLU neural network. II: Ritz approximation to elliptic PDEs
- Self-adaptive deep neural network: numerical approximation to functions and PDEs
- Accelerating mini-batch SARAH by step size rules
- An online conjugate gradient algorithm for large-scale data analysis in machine learning
- On obtaining sparse semantic solutions for inverse problems, control, and neural network training
- Utilizing second order information in minibatch stochastic variance reduced proximal iterations
- Feasibility-based fixed point networks
- A review on deep reinforcement learning for fluid mechanics
- Spurious valleys in one-hidden-layer neural network optimization landscapes
- Finite-sum smooth optimization with SARAH
- A robust multi-batch L-BFGS method for machine learning
- Stochastic analysis of an adaptive cubic regularization method under inexact gradient evaluations and dynamic Hessian accuracy
- Title not available (Why is that?)
- Title not available (Why is that?)
- Model order reduction method based on (r)POD-ANNs for parameterized time-dependent partial differential equations
- Retracted: Model order reduction method based on machine learning for parameterized time-dependent partial differential equations
- Sub-linear convergence of a stochastic proximal iteration method in Hilbert space
- Interpreting rate-distortion of variational autoencoder and using model uncertainty for anomaly detection
- High resolution 3D ultrasonic breast imaging by time-domain full waveform inversion
- Classification, inference and segmentation of anomalous diffusion with recurrent neural networks
- A fully stochastic second-order trust region method
- SHOPPER: a probabilistic model of consumer choice with substitutes and complements
- An abstract convergence framework with application to inertial inexact forward-backward methods
- Sparsity and level set regularization for near-field electromagnetic imaging in 3D
- An elastic net penalized small area model combining unit- and area-level data for regional hypertension prevalence estimation
- A finite time analysis of temporal difference learning with linear function approximation
- Optimization for deep learning: an overview
- SABRINA: a stochastic subspace majorization-minimization algorithm
- ODE-RU: a dynamical system view on recurrent neural networks
- The mixed deep energy method for resolving concentration features in finite strain hyperelasticity
- Adaptive sequential sample average approximation for solving two-stage stochastic linear programs
- Stochastic gradient descent with Polyak's learning rate
- Bilevel optimization, deep learning and fractional Laplacian regularization with applications in tomography
- Warped Riemannian metrics for location-scale models
- Stochastic sampling for deterministic structural topology optimization with many load cases: density-based and ground structure approaches
- On the regularizing property of stochastic gradient descent
- Distributed nonconvex constrained optimization over time-varying digraphs
- Exploiting negative curvature in deterministic and stochastic optimization
- Title not available (Why is that?)
- Multi-agent natural actor-critic reinforcement learning algorithms
- A subsampling approach for Bayesian model selection
- Learning the tangent space of dynamical instabilities from data
- Stochastic proximal linear method for structured non-convex problems
- Linear convergence of proximal incremental aggregated gradient method for nonconvex nonsmooth minimization problems
- PPINN: parareal physics-informed neural network for time-dependent PDEs
- Convergence of stochastic gradient descent in deep neural network
- Improved variance reduction extragradient method with line search for stochastic variational inequalities
- Gradient descent finds the cubic-regularized nonconvex Newton step
- Multicomposite nonconvex optimization for training deep neural networks
- Subgradient Sampling for Nonsmooth Nonconvex Minimization
- Newton-type methods for non-convex optimization under inexact Hessian information
- Recovering missing CFD data for high-order discretizations using deep neural networks and dynamics learning
- Convergence analysis of neural networks for solving a free boundary problem
- A unified convergence analysis of stochastic Bregman proximal gradient and extragradient methods
- Stochastic gradient descent with noise of machine learning type. I: Discrete time analysis
- Convergence rates for the stochastic gradient descent method for non-convex objective functions
- Deep learning: an introduction for applied mathematicians
- Nonlinear Gradient Mappings and Stochastic Optimization: A General Framework with Applications to Heavy-Tail Noise
- Generalized gradients in dynamic optimization, optimal control, and machine learning problems
- Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning
- Accelerating variance-reduced stochastic gradient methods
- A stochastic subspace approach to gradient-free optimization in high dimensions
- Bias of homotopic gradient descent for the hinge loss
- Adaptive machine learning-based surrogate modeling to accelerate PDE-constrained optimization in enhanced oil recovery
- Bi-fidelity stochastic gradient descent for structural optimization under uncertainty
- Incremental without replacement sampling in nonconvex optimization
- Sample complexity of sample average approximation for conditional stochastic optimization
- Sublinear convergence of a tamed stochastic gradient descent method in Hilbert space
- A stochastic semismooth Newton method for nonsmooth nonconvex optimization
- Stochastic generalized gradient methods for training nonconvex nonsmooth neural networks
- Sequential convergence of AdaGrad algorithm for smooth convex optimization
- The generalized equivalence of regularization and min-max robustification in linear mixed models
- Adaptive optimization with periodic dither signals
- On large-scale unconstrained optimization and arbitrary regularization
- On the inexact scaled gradient projection method
- Fokker-Planck particle systems for Bayesian inference: computational approaches
- LSPIA, (stochastic) gradient descent, and parameter correction
- Minibatch forward-backward-forward methods for solving stochastic variational inequalities
- Scheduled restart momentum for accelerated stochastic gradient descent
Uses Software
This page was built for publication: Optimization methods for large-scale machine learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4641709)