Abstract
In Federated Learning (FL), clients independently train local models and share them with a central aggregator to build a global model. Impermissibility to access clients' data and collaborative training make FL appealing for applications with data-privacy concerns, such as medical imaging. However, these FL characteristics pose unprecedented challenges for debugging. When a global model's performance deteriorates, identifying the responsible rounds and clients is a major pain point. Developers resort to trial-and-error debugging with subsets of clients, hoping to increase the global model's accuracy or let future FL rounds retune the model, which are time-consuming and costly. We design a systematic fault localization framework, Fedde-bug,that advances the FL debugging on two novel fronts. First, Feddebug enables interactive debugging of realtime collaborative training in FL by leveraging record and replay techniques to construct a simulation that mirrors live FL. Feddebug'sbreakpoint can help inspect an FL state (round, client, and global model) and move between rounds and clients' models seam-lessly, enabling a fine-grained step-by-step inspection. Second, Feddebug automatically identifies the client(s) responsible for lowering the global model's performance without any testing data and labels-both are essential for existing debugging techniques. Feddebug's strengths come from adapting differential testing in conjunction with neuron activations to determine the client(s) deviating from normal behavior. Feddebug achieves 100% accuracy in finding a single faulty client and 90.3% accuracy in finding multiple faulty clients. Feddebug's interactive de-bugging incurs 1.2% overhead during training, while it localizes a faulty client in only 2.1% of a round's training time. With FedDebug,we bring effective debugging practices to federated learning, improving the quality and productivity of FL application developers.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2023 IEEE/ACM 45th International Conference on Software Engineering, ICSE 2023 |
Publisher | IEEE Computer Society |
Pages | 512-523 |
Number of pages | 12 |
ISBN (Electronic) | 9781665457019 |
DOIs | |
State | Published - 2023 |
Event | 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023 - Melbourne, Australia Duration: May 15 2023 → May 16 2023 |
Publication series
Name | Proceedings - International Conference on Software Engineering |
---|---|
ISSN (Print) | 0270-5257 |
Conference
Conference | 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023 |
---|---|
Country/Territory | Australia |
City | Melbourne |
Period | 5/15/23 → 5/16/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- CNN
- client
- fault localization
- federated learning
- neural networks
- software debugging
- testing