On optimal probabilities in stochastic coordinate descent methods
From MaRDI portal
Abstract: We propose and analyze a new parallel coordinate descent method---`NSync---in which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen non-uniformly. We derive convergence rates under a strong convexity assumption, and comment on how to assign probabilities to the sets to optimize the bound. The complexity and practical performance of the method can outperform its uniform variant by an order of magnitude. Surprisingly, the strategy of updating a single randomly selected coordinate per iteration---with optimal probabilities---may require less iterations, both in theory and practice, than the strategy of updating all coordinates at every iteration.
Recommendations
- Semi-stochastic coordinate descent
- Optimal survey schemes for stochastic gradient descent with applications to \(M\)-estimation
- Random Coordinate Descent Methods for <inline-formula> <tex-math notation="TeX">$\ell_{0}$</tex-math></inline-formula> Regularized Convex Optimization
- Stochastic primal-dual coordinate method for regularized empirical risk minimization
- Probabilistic optimization via approximate \(p\)-efficient points and bundle methods
- Stochastic proximal gradient methods for nonconvex problems in Hilbert spaces
- Stochastic (Approximate) Proximal Point Methods: Convergence, Optimality, and Adaptivity
- A Stochastic Approximation Framework for a Class of Randomized Optimization Algorithms
Cites work
- scientific article; zbMATH DE number 6253925 (Why is no real title available?)
- Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization
- Coordinate descent with arbitrary sampling. II: Expected separable overapproximation.
- Distributed coordinate descent method for learning with big data
- Efficiency of coordinate descent methods on huge-scale optimization problems
- Efficient serial and parallel coordinate descent methods for huge-scale truss topology design
- Inexact coordinate descent: complexity and preconditioning
- Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
- On the complexity analysis of randomized block-coordinate descent methods
- Parallel coordinate descent methods for big data optimization
- Random Coordinate Descent Algorithms for Multi-Agent Convex Optimization Over Networks
- Randomized iterative methods for linear systems
- Separable approximations and decomposition methods for the augmented Lagrangian
- Smooth minimization of nonsmooth functions with parallel coordinate descent methods
- Stochastic block mirror descent methods for nonsmooth and stochastic optimization
- Stochastic dual coordinate ascent methods for regularized loss minimization
Cited in
(28)- High-performance statistical computing in the computing environments of the 2020s
- Fully asynchronous stochastic coordinate descent: a tight lower bound on the parallelism achieving linear speedup
- Coordinate descent with arbitrary sampling. I: Algorithms and complexity.
- A randomized coordinate descent method with volume sampling
- Faster convergence of a randomized coordinate descent method for linearly constrained optimization problems
- Parallel random block-coordinate forward-backward algorithm: a unified convergence analysis
- Adaptive coordinate sampling for stochastic primal–dual optimization
- Efficient exponential tilting with applications
- Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications
- scientific article; zbMATH DE number 7370629 (Why is no real title available?)
- On the complexity of parallel coordinate descent
- Proximal gradient methods with adaptive subspace sampling
- Fastest rates for stochastic mirror descent methods
- Accelerated, parallel, and proximal coordinate descent
- Mini-batch stochastic subgradient for functional constrained optimization
- Parallel stochastic asynchronous coordinate descent: tight bounds on the possible parallelism
- A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)
- Batched Stochastic Gradient Descent with Weighted Sampling
- Stochastic average model methods
- A generic coordinate descent solver for non-smooth convex optimisation
- scientific article; zbMATH DE number 6982318 (Why is no real title available?)
- Distributed block coordinate descent for minimizing partially separable functions
- Stochastic reformulations of linear systems: algorithms and convergence theory
- scientific article; zbMATH DE number 7370566 (Why is no real title available?)
- Parallel coordinate descent methods for big data optimization
- Optimization in high dimensions via accelerated, parallel, and proximal coordinate descent
- Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping. II: Mean-square and linear convergence
- Distributed Learning with Sparse Communications by Identification
This page was built for publication: On optimal probabilities in stochastic coordinate descent methods
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q315487)