Graph-dependent implicit regularisation for distributed stochastic subgradient descent
From MaRDI portal
Publication:4969072
Abstract: We propose graph-dependent implicit regularisation strategies for distributed stochastic subgradient descent (Distributed SGD) for convex problems in multi-agent learning. Under the standard assumptions of convexity, Lipschitz continuity, and smoothness, we establish statistical learning rates that retain, up to logarithmic terms, centralised statistical guarantees through implicit regularisation (step size tuning and early stopping) with appropriate dependence on the graph topology. Our approach avoids the need for explicit regularisation in decentralised learning problems, such as adding constraints to the empirical risk minimisation rule. Particularly for distributed methods, the use of implicit regularisation allows the algorithm to remain simple, without projections or dual methods. To prove our results, we establish graph-independent generalisation bounds for Distributed SGD that match the centralised setting (using algorithmic stability), and we establish graph-dependent optimisation bounds that are of independent interest. We present numerical experiments to show that the qualitative nature of the upper bounds we derive can be representative of real behaviours.
Recommendations
- Distributed learning with multi-penalty regularization
- Distributed kernel-based gradient descent algorithms
- DSA: decentralized double stochastic averaging gradient algorithm
- On the convergence of exact distributed generalisation and acceleration algorithm for convex optimisation
- Distributed semi-supervised regression learning with coefficient regularization
Cites work
- scientific article; zbMATH DE number 823069 (Why is no real title available?)
- 10.1162/153244302760200704
- A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems
- A finite sample distribution-free performance bound for local discrimination rules
- An optimal method for stochastic composite optimization
- Convex optimization: algorithms and complexity
- DSA: decentralized double stochastic averaging gradient algorithm
- Data-Dependent Convergence for Consensus Stochastic Optimization
- Decentralized estimation and control of graph connectivity for mobile sensor networks
- Distributed Subgradient Methods for Convex Optimization Over Random Networks
- Distributed Subgradient Methods for Multi-Agent Optimization
- Distributed asynchronous deterministic and stochastic gradient optimization algorithms
- Distributed optimization and statistical learning via the alternating direction method of multipliers
- Distributed stochastic subgradient projection algorithms for convex optimization
- Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
- Dual averaging methods for regularized stochastic learning and online optimization
- EXTRA: an exact first-order algorithm for decentralized consensus optimization
- Introductory lectures on convex optimization. A basic course.
- Iterative regularization for learning with convex loss functions
- Learnability, stability and uniform convergence
- Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization
- Nonparametric stochastic approximation with large step-sizes
- On Distributed Averaging Algorithms and Quantization Effects
- Online Learning as Stochastic Approximation of Regularization Paths: Optimality and Almost-Sure Convergence
- Online gradient descent learning algorithms
- Optimal distributed online prediction using mini-batches
- Optimal rates for multi-pass stochastic gradient methods
- Scikit-learn: machine learning in Python
Cited in
(3)
This page was built for publication: Graph-dependent implicit regularisation for distributed stochastic subgradient descent
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4969072)