Optimal distributed online prediction using mini-batches
From MaRDI portal
Abstract: Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the emph{distributed mini-batch} algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem.
Recommendations
Cited in
(43)- Unifying mirror descent and dual averaging
- On the parallelization upper bound for asynchronous stochastic gradients descent in non-convex optimization
- Graph-dependent implicit regularisation for distributed stochastic subgradient descent
- Stochastic gradient descent for semilinear elliptic equations with uncertainties
- Linear coupling: an ultimate unification of gradient and mirror descent
- scientific article; zbMATH DE number 7307490 (Why is no real title available?)
- Quantile-based iterative methods for corrupted systems of linear equations
- Semi-discrete optimal transport: hardness, regularization and numerical solution
- Distributed prediction from vertically partitioned data
- A sparsity preserving stochastic gradient methods for sparse regression
- Distributed learning with regularized least squares
- scientific article; zbMATH DE number 7740914 (Why is no real title available?)
- Non-smooth setting of stochastic decentralized convex optimization problem over time-varying graphs
- Online Estimation for Functional Data
- scientific article; zbMATH DE number 7800967 (Why is no real title available?)
- Consensus-based modeling using distributed feature construction with ILP
- Optimal rates for multi-pass stochastic gradient methods
- Utilizing second order information in minibatch stochastic variance reduced proximal iterations
- Random gradient extrapolation for distributed and stochastic optimization
- Stochastic distributed learning with gradient quantization and double-variance reduction
- Likelihood Inference for Large Scale Stochastic Blockmodels With Covariates Based on a Divide-and-Conquer Parallelizable Algorithm With Communication
- Feature-aware regularization for sparse online learning
- Multi-round smoothed composite quantile regression for distributed data
- Complexity Analysis of stochastic gradient methods for PDE-constrained optimal Control Problems with uncertain parameters
- Vaidya's method for convex stochastic optimization problems in small dimension
- A modular analysis of adaptive (non-)convex optimization: optimism, composite objectives, variance reduction, and variational bounds
- scientific article; zbMATH DE number 7626728 (Why is no real title available?)
- Scaling up stochastic gradient descent for non-convex optimisation
- On the convergence analysis of asynchronous SGD for solving consistent linear systems
- Revisiting EXTRA for Smooth Distributed Optimization
- Communication-efficient sparse composite quantile regression for distributed data
- Accelerating deep neural network training with inconsistent stochastic gradient descent
- Efficient online and batch learning using forward backward splitting
- Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification
- Multilevel composite stochastic optimization via nested variance reduction
- Batched Stochastic Gradient Descent with Weighted Sampling
- Sample size selection in optimization methods for machine learning
- Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization
- Online learning of a weighted selective naive Bayes classifier with non-convex optimization
- Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression
- Distributed optimization and statistical learning for large-scale penalized expectile regression
- scientific article; zbMATH DE number 6982986 (Why is no real title available?)
- Distributed learning for random vector functional-link networks
This page was built for publication: Optimal distributed online prediction using mini-batches
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5405113)