FLOAT: Federated Learning Optimizations with Automated Tuning

Ahmad Faraz Khan, Azal Ahmad Khan, Ahmed M. Abdelmoniem, Samuel Fountain, Ali R. Butt, Ali Anwar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Federated Learning (FL) has emerged as a powerful approach that enables collaborative distributed model training without the need for data sharing. However, FL grapples with inherent heterogeneity challenges leading to issues such as stragglers, dropouts, and performance variations. Selection of clients to run an FL instance is crucial, but existing strategies introduce biases and participation issues and do not consider resource efficiency. Communication and training acceleration solutions proposed to increase client participation also fall short due to the dynamic nature of system resources. We address these challenges in this paper by designing FLOAT, a novel framework designed to boost FL client resource awareness. FLOAT optimizes resource utilization dynamically for meeting training deadlines, and mitigates stragglers and dropouts through various optimization techniques; leading to enhanced model convergence and improved performance. FLOAT leverages multi-objective Reinforcement Learning with Human Feedback (RLHF) to automate the selection of the optimization techniques and their configurations, tailoring them to individual client resource conditions. Moreover, FLOAT seamlessly integrates into existing FL systems, maintaining non-intrusiveness and versatility for both asynchronous and synchronous FL settings. As per our evaluations, FLOAT increases accuracy by up to 53%, reduces client dropouts by up to 78×, and improves communication, computation, and memory utilization by up to 81×, 44×, and 20× respectively.

Original languageEnglish (US)
Title of host publicationEuroSys 2024 - Proceedings of the 2024 European Conference on Computer Systems
PublisherAssociation for Computing Machinery, Inc
Pages200-218
Number of pages19
ISBN (Electronic)9798400704376
DOIs
StatePublished - Apr 22 2024
Event19th European Conference on Computer Systems, EuroSys 2024 - Athens, Greece
Duration: Apr 22 2024Apr 25 2024

Publication series

NameEuroSys 2024 - Proceedings of the 2024 European Conference on Computer Systems

Conference

Conference19th European Conference on Computer Systems, EuroSys 2024
Country/TerritoryGreece
CityAthens
Period4/22/244/25/24

Bibliographical note

Publisher Copyright:
© 2024 Owner/Author.

Keywords

  • Federated Learning
  • Machine Learning Systems
  • Resource Management

Fingerprint

Dive into the research topics of 'FLOAT: Federated Learning Optimizations with Automated Tuning'. Together they form a unique fingerprint.

Cite this