Large-scale real-time analytics services continuously collect and analyze data from end-user applications and devices distributed around the globe. Such analytics requires data to be transferred over the wide-area network (WAN) to data centers (DCs) capable of processing the data. Since WAN bandwidth is expensive and scarce, it is beneficial to reduce WAN traffic by partially aggregating the data closer to end-users. We propose aggregation networks for performing aggregation on a geo-distributed edge-cloud infrastructure consisting of edge servers, transit and destination DCs. We identify a rich set of research questions aimed at reducing the traffic costs in an aggregation network. We present an optimization formulation for solving these questions in a principled manner, and use insights from the optimization solutions to propose an efficient, near-optimal practical heuristic. We implement the heuristic in AggNet, built on top of Apache Flink. We evaluate our approach using a geo-distributed deployment on Amazon EC2 as well as a WAN-emulated local testbed. Our evaluation using real-world traces from Twitter and Akamai shows that our approach is able to achieve 47% to 83% reduction in traffic cost over existing baselines without any compromise in timeliness.
|Original language||English (US)|
|Title of host publication||6th ACM/IEEE Symposium on Edge Computing, SEC 2021|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||15|
|State||Published - 2021|
|Event||6th ACM/IEEE Symposium on Edge Computing, SEC 2021 - San Jose, United States|
Duration: Dec 14 2021 → Dec 17 2021
|Name||6th ACM/IEEE Symposium on Edge Computing, SEC 2021|
|Conference||6th ACM/IEEE Symposium on Edge Computing, SEC 2021|
|Period||12/14/21 → 12/17/21|
Bibliographical noteFunding Information:
The authors thank the anonymous reviewers for many constructive comments and suggestions that greatly improved the quality of this paper. This work was sponsored in part by NSF under Grants CNS-1717834 and CNS-1717179, as well as by DARPA contract HR001117C0049.
© 2021 ACM.
- Geo-distributed systems
- Stream processing