Graph-dependent implicit regularisation for distributed stochastic subgradient descent
From MaRDI portal
Publication:4969072
zbMATH Open1498.68261arXiv1809.06958MaRDI QIDQ4969072FDOQ4969072
Dominic Richards, Patrick Rebeschini
Publication date: 5 October 2020
Abstract: We propose graph-dependent implicit regularisation strategies for distributed stochastic subgradient descent (Distributed SGD) for convex problems in multi-agent learning. Under the standard assumptions of convexity, Lipschitz continuity, and smoothness, we establish statistical learning rates that retain, up to logarithmic terms, centralised statistical guarantees through implicit regularisation (step size tuning and early stopping) with appropriate dependence on the graph topology. Our approach avoids the need for explicit regularisation in decentralised learning problems, such as adding constraints to the empirical risk minimisation rule. Particularly for distributed methods, the use of implicit regularisation allows the algorithm to remain simple, without projections or dual methods. To prove our results, we establish graph-independent generalisation bounds for Distributed SGD that match the centralised setting (using algorithmic stability), and we establish graph-dependent optimisation bounds that are of independent interest. We present numerical experiments to show that the qualitative nature of the upper bounds we derive can be representative of real behaviours.
Full work available at URL: https://arxiv.org/abs/1809.06958
Recommendations
- Distributed learning with multi-penalty regularization
- Distributed kernel-based gradient descent algorithms
- DSA: decentralized double stochastic averaging gradient algorithm
- On the convergence of exact distributed generalisation and acceleration algorithm for convex optimisation
- Distributed semi-supervised regression learning with coefficient regularization
algorithmic stabilitydistributed machine learninggeneralisation boundsimplicit regularisationmulti-agent optimisation
Learning and adaptive systems in artificial intelligence (68T05) Stochastic programming (90C15) Distributed algorithms (68W15)
Cites Work
- Title not available (Why is that?)
- Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
- Title not available (Why is that?)
- Introductory lectures on convex optimization. A basic course.
- Nonparametric stochastic approximation with large step-sizes
- 10.1162/153244302760200704
- Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
- Decentralized estimation and control of graph connectivity for mobile sensor networks
- Dual averaging methods for regularized stochastic learning and online optimization
- An optimal method for stochastic composite optimization
- Distributed Subgradient Methods for Multi-Agent Optimization
- EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization
- Distributed Subgradient Methods for Convex Optimization Over Random Networks
- A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems
- Distributed asynchronous deterministic and stochastic gradient optimization algorithms
- On Distributed Averaging Algorithms and Quantization Effects
- Distributed stochastic subgradient projection algorithms for convex optimization
- Online gradient descent learning algorithms
- Online Learning as Stochastic Approximation of Regularization Paths: Optimality and Almost-Sure Convergence
- Convex optimization: algorithms and complexity
- DSA: decentralized double stochastic averaging gradient algorithm
- Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization
- Learnability, stability and uniform convergence
- Optimal Distributed Online Prediction using Mini-Batches
- A finite sample distribution-free performance bound for local discrimination rules
- Optimal Rates for Multi-pass Stochastic Gradient Methods
- Iterative regularization for learning with convex loss functions
- Data-Dependent Convergence for Consensus Stochastic Optimization
Cited In (3)
Uses Software
This page was built for publication: Graph-dependent implicit regularisation for distributed stochastic subgradient descent
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4969072)