Abstract
Federated Learning (FL) has emerged as a powerful approach that enables collaborative distributed model training without the need for data sharing. However, FL grapples with inherent heterogeneity challenges leading to issues such as stragglers, dropouts, and performance variations. Selection of clients to run an FL instance is crucial, but existing strategies introduce biases and participation issues and do not consider resource efficiency. Communication and training acceleration solutions proposed to increase client participation also fall short due to the dynamic nature of system resources. We address these challenges in this paper by designing FLOAT, a novel framework designed to boost FL client resource awareness. FLOAT optimizes resource utilization dynamically for meeting training deadlines, and mitigates stragglers and dropouts through various optimization techniques; leading to enhanced model convergence and improved performance. FLOAT leverages multi-objective Reinforcement Learning with Human Feedback (RLHF) to automate the selection of the optimization techniques and their configurations, tailoring them to individual client resource conditions. Moreover, FLOAT seamlessly integrates into existing FL systems, maintaining non-intrusiveness and versatility for both asynchronous and synchronous FL settings. As per our evaluations, FLOAT increases accuracy by up to 53%, reduces client dropouts by up to 78×, and improves communication, computation, and memory utilization by up to 81×, 44×, and 20× respectively.
Original language | English (US) |
---|---|
Title of host publication | EuroSys 2024 - Proceedings of the 2024 European Conference on Computer Systems |
Publisher | Association for Computing Machinery, Inc |
Pages | 200-218 |
Number of pages | 19 |
ISBN (Electronic) | 9798400704376 |
DOIs | |
State | Published - Apr 22 2024 |
Event | 19th European Conference on Computer Systems, EuroSys 2024 - Athens, Greece Duration: Apr 22 2024 → Apr 25 2024 |
Publication series
Name | EuroSys 2024 - Proceedings of the 2024 European Conference on Computer Systems |
---|
Conference
Conference | 19th European Conference on Computer Systems, EuroSys 2024 |
---|---|
Country/Territory | Greece |
City | Athens |
Period | 4/22/24 → 4/25/24 |
Bibliographical note
Publisher Copyright:© 2024 Owner/Author.
Keywords
- Federated Learning
- Machine Learning Systems
- Resource Management