Abstract
In modern data centers, many flow-based and task-based schemes have been proposed to speed up the data transmission in order to provide fast, reliable services for millions of users. However, the existing flow-based schemes treat all flows in isolation, contributing less to or even hurting user experience due to the stalled flows. Other prevalent task-based approaches, such as centralized and decentralized scheduling, are sophisticated or unable to share task information. In this work, we first reveal that the relinquishing bandwidth of leading flows to the stalled ones effectively reduces the task completion time. We further present the design and implementation of a general supporting scheme that shares the flow-tardiness information through a receiver-driven coordination. Our scheme can be flexible and widely integrated with the state-of-the-art TCP protocols designed for data centers in either single stage or multiple stage scenario, while making no modification on switches. Through the testbed experiments and simulations of typical data center applications, we show that in single stage scenario, our scheme reduces the task completion time by 70% and 50% compared with the flow-based protocols (e.g., DCTCP, L 2 DCT) and task-based scheduling (e.g., Baraat), respectively. Moreover, our scheme also outperforms other approaches by 18% ∼ 25% in prevalent topologies of the data center. For multiple stage scenario, our scheme also has up to 50% improvement compared to other schemes.
Original language | English (US) |
---|---|
Article number | 8620341 |
Pages (from-to) | 389-404 |
Number of pages | 16 |
Journal | IEEE/ACM Transactions on Networking |
Volume | 27 |
Issue number | 1 |
DOIs | |
State | Published - Feb 2019 |
Bibliographical note
Funding Information:Manuscript received September 20, 2017; revised July 30, 2018; accepted December 11, 2018; approved by IEEE/ACM TRANSACTIONS ON NET-WORKING Editor Y. Zhang. Date of publication January 21, 2019; date of current version February 14, 2019. This work was supported in part by the National Natural Science Foundation of China under Grant 61872387, Grant 61572530, Grant 61462007, Grant 61420106009, and Grant 61872403, in part by the Next Generation Internet Innovation Foundation under Grant NGII20160113, and in part by the China Scholarship Council under Grant 201706370143. (Corresponding author: Jiawei Huang.) S. Liu, J. Huang, Y. Zhou, and J. Wang are with the School of Information Science and Engineering, Central South University, Changsha 410083, China (e-mail: [email protected]).
Publisher Copyright:
© 2018 IEEE.
Keywords
- Data center
- TCP
- coflow
- congestion control
- task-aware