2Direction: Theoretically Faster Distributed Training with Bidirectional Communication Compression
From MaRDI portal
Publication:6437359
arXiv2305.12379MaRDI QIDQ6437359FDOQ6437359
Authors: Alexander Tyurin, Peter Richtárik
Publication date: 21 May 2023
Abstract: We consider distributed convex optimization problems in the regime when the communication between the server and the workers is expensive in both uplink and downlink directions. We develop a new and provably accelerated method, which we call 2Direction, based on fast bidirectional compressed communication and a new bespoke error-feedback mechanism which may be of independent interest. Indeed, we find that the EF and EF21-P mechanisms (Seide et al., 2014; Gruntkowska et al., 2023) that have considerable success in the design of efficient non-accelerated methods are not appropriate for accelerated methods. In particular, we prove that 2Direction improves the previous state-of-the-art communication complexity (Gruntkowska et al., 2023) to in the -strongly-convex setting, where and are smoothness constants, is # of workers, and are compression errors of the Rand and Top sparsifiers (as examples), is # of coordinates/bits that the server and workers send to each other. Moreover, our method is the first that improves upon the communication complexity of the vanilla accelerated gradient descent (AGD) method (Nesterov, 2018). We obtain similar improvements in the general convex regime as well. Finally, our theoretical findings are corroborated by experimental evidence.
This page was built for publication: 2Direction: Theoretically Faster Distributed Training with Bidirectional Communication Compression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6437359)