Abstract
This work develops a novel framework for communication-efficient distributed learning where the models to be learnt are overparameterized. We focus on a class of kernel learning problems (which includes the popular neural tangent kernel (NTK) learning as a special case) and propose a novel multi-agent kernel approximation technique that allows the agents to distributedly estimate the full kernel function, and subsequently perform distributed learning, without directly exchanging any local data or parameters. The proposed framework is a significant departure from the classical consensus-based approaches, because the agents do not exchange problem parameters, and consensus is not required. We analyze the optimization and the generalization performance of the proposed framework for the `2 loss. We show that with M agents and N total samples, when certain generalized inner-product (GIP) kernels (resp. the random features (RF) kernel) are used, each agent needs to communicate O(N2/M) bits (resp. O(NpN/M) real values) to achieve minimax optimal generalization performance. Further, we show that the proposed algorithms can significantly reduce the communication complexity compared with state-of-the-art algorithms, for distributedly training models to fit UCI benchmarking datasets. Moreover, each agent needs to share about 200N/M bits to closely match the performance of the centralized algorithms, and these numbers are independent of parameter and feature dimension.
Original language | English (US) |
---|---|
State | Published - 2022 |
Event | 10th International Conference on Learning Representations, ICLR 2022 - Virtual, Online Duration: Apr 25 2022 → Apr 29 2022 |
Conference
Conference | 10th International Conference on Learning Representations, ICLR 2022 |
---|---|
City | Virtual, Online |
Period | 4/25/22 → 4/29/22 |
Bibliographical note
Funding Information:We thank the anonymous reviewers for their valuable comments and suggestions. The work of Prashant Khanduri and Mingyi Hong is supported in part by NSF grant CMMI-1727757, AFOSR grant 19RT0424, ARO grant W911NF-19-1-0247 and Meta research award on “Mathematical modeling and optimization for large-scale distributed systems”. The work of Mingyi Hong is also supported by an IBM Faculty Research award. The work of Jia Liu is supported in part by NSF grants CAREER CNS-2110259, CNS-2112471, CNS-2102233, CCF-2110252, ECCS-2140277, and a Google Faculty Research Award. The work of Hoi-To Wai was supported by CUHK Direct Grant #4055113.
Publisher Copyright:
© 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved.