TY - JOUR
T1 - Hybrid Federated Learning for Feature and Sample Hetero-geneity
T2 - Algorithms and Implementation
AU - Zhang, Xinwei
AU - Yin, Wotao
AU - Chen, Tianyi
AU - Hong, Mingyi
N1 - Publisher Copyright:
© 2024, Transactions on Machine Learning Research. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Federated learning (FL) is a popular distributed machine learning paradigm dealing with distributed and private data sets. Based on the data partition pattern, FL is often categorized into horizontal, vertical, and hybrid settings. All three settings have many applications, but hybrid FL remains relatively less explored because it deals with the challenging situation where both the feature space and the data samples are heterogeneous. Hybrid FL combines the advantages of both horizontal and vertical FL, addressing some of their individual limita-tions, such as the same-features requirement of the former and the same-entities requirement of the latter. This work designs a novel mathematical model that allows clients to aggregate distributed data with heterogeneous and possibly overlapping features and samples. Our main idea is to partition each client’s model into a feature extractor part and a classifier part, where the former can be used to process the input data, while the latter is used to perform the learning from the extracted features. The heterogeneous feature aggregation is done by building a server model that assimilates local classifiers and feature extractors through a carefully designed matching mechanism. A communication-efficient algorithm is then designed to train both the client and server models. Finally, we conducted numerical experiments on multiple image classification data sets to validate the performance of the proposed algorithm. To our knowledge, this is the first formulation and algorithm developed for hybrid FL.
AB - Federated learning (FL) is a popular distributed machine learning paradigm dealing with distributed and private data sets. Based on the data partition pattern, FL is often categorized into horizontal, vertical, and hybrid settings. All three settings have many applications, but hybrid FL remains relatively less explored because it deals with the challenging situation where both the feature space and the data samples are heterogeneous. Hybrid FL combines the advantages of both horizontal and vertical FL, addressing some of their individual limita-tions, such as the same-features requirement of the former and the same-entities requirement of the latter. This work designs a novel mathematical model that allows clients to aggregate distributed data with heterogeneous and possibly overlapping features and samples. Our main idea is to partition each client’s model into a feature extractor part and a classifier part, where the former can be used to process the input data, while the latter is used to perform the learning from the extracted features. The heterogeneous feature aggregation is done by building a server model that assimilates local classifiers and feature extractors through a carefully designed matching mechanism. A communication-efficient algorithm is then designed to train both the client and server models. Finally, we conducted numerical experiments on multiple image classification data sets to validate the performance of the proposed algorithm. To our knowledge, this is the first formulation and algorithm developed for hybrid FL.
UR - http://www.scopus.com/inward/record.url?scp=85205854787&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85205854787&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85205854787
SN - 2835-8856
VL - 2024
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -