Many important cloud services require replicating massive data from one datacenter (DC) to multiple DCs. While the performance of pair-wise inter-DC data transfers has been much improved, prior solutions are insufficient to optimize bulk-data multicast, as they fail to explore the rich inter-DC overlay paths that exist in geo-distributed DCs, as well as the remaining bandwidth reserved for online traffic under fixed bandwidth separation scheme. To take advantage of these opportunities, we present BDS+, a near-optimal network system for large-scale inter-DC data replication. BDS+ is an application-level multicast overlay network with a fully centralized architecture, allowing a central controller to maintain an up-to-date global view of data delivery status of intermediate servers, in order to fully utilize the available overlay paths. Furthermore, in each overlay path, it leverages dynamic bandwidth separation to make use of the remaining available bandwidth reserved for online traffic. By constantly estimating online traffic demand and rescheduling bulk-data transfers accordingly, BDS+ can further speed up the massive data multicast. Through a pilot deployment in one of the largest online service providers and large-scale real-trace simulations, we show that BDS+ can achieve 3- 5\times speedup over the provider's existing system and several well-known overlay routing baselines of static bandwidth separation. Moreover, dynamic bandwidth separation can further reduce the completion time of bulk data transfers by 1.2 to 1.3 times.
Bibliographical noteFunding Information:
Manuscript received March 26, 2019; revised January 7, 2020; accepted January 12, 2021; approved by IEEE/ACM TRANSACTIONS ON NETWORK-ING Editor N. Hegde. Date of publication February 10, 2021; date of current version April 16, 2021. The work of Yuchao Zhang was supported in part by the National Key Research and Development Program of China under Grant 2019YFB1802603, in part by the National Natural Science Foundation of China (NSFC) Youth Science Foundation under Grant 61802024, and in part by the Fundamental Research Funds for the Central Universities under Grant 2482020RC36. The work of Ke Xu was supported in part by the Science and Technology Innovation Project under Grant 2020KJ010501, in part by the China National Funds for Distinguished Young Scientists under Grant 61825204, and in part by the Beijing Outstanding Young Scientist Program under Grant BJJWZYJH01201910003011. (Yuchao Zhang and Xiaohui Nie contributed equally to this work.) (Corresponding author: Yuchao Zhang.) Yuchao Zhang is with the School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China (e-mail: firstname.lastname@example.org).
© 1993-2012 IEEE.
- Centralized control
- Data replication
- Dynamic bandwidth separation
- Overlay network