FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers

Zheng Chai, Yujing Chen, Ali Anwar, Liang Zhao, Yue Cheng, Huzefa Rangwala

Research output: Chapter in Book/Report/Conference proceedingConference contribution

95 Scopus citations

Abstract

Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem where clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one single dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based, synchronous mechanisms to tackle the straggler problem. However, asynchronous methods can easily create a communication bottleneck, while tiering may introduce biases that favor faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning system with Asynchronous Tiers under Non-i.i.d.Training data. FedAT synergistically combines synchronous, intra-Tier training and asynchronous, cross-Tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-Aware, weighted aggregation heuristic to steer and balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient, polyline-encodingbased compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5×, compared to state-of-The-Art FL methods.

Original languageEnglish (US)
Title of host publicationProceedings of SC 2021
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond
PublisherIEEE Computer Society
ISBN (Electronic)9781450384421
DOIs
StatePublished - Nov 14 2021
Externally publishedYes
Event33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021 - Virtual, Online, United States
Duration: Nov 14 2021Nov 19 2021

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021
Country/TerritoryUnited States
CityVirtual, Online
Period11/14/2111/19/21

Bibliographical note

Publisher Copyright:
© 2021 IEEE Computer Society. All rights reserved.

Keywords

  • asynchronous distributed learning
  • communication efficiency
  • federated learning
  • tiering
  • weighted aggregation

Fingerprint

Dive into the research topics of 'FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers'. Together they form a unique fingerprint.

Cite this