Abstract
Wide-area data analytics has gained much attention in recent years due to the increasing need for analyzing data that are geographically distributed. Many of such queries often require real-time analysis on data streams that are continuously being generated across multiple locations. Yet, analyzing these geo-distributed data streams in a timely manner is very challenging due to the highly heterogeneous and limited bandwidth availability of the wide-area network (WAN). This paper examines the opportunity of applying multi-query optimization in the context of wide-area streaming analytics, with the goal of utilizing WAN bandwidth efficiently while achieving high throughput and low latency execution. Our approach is based on the insight that many streaming analytics queries often exhibit common executions, whether in consuming a common set of input data or performing the same data processing. In this work, we study different types of sharing opportunities and propose a practical online algorithm that allows streaming analytics queries to share their common executions incrementally. We further address the importance of WAN awareness in applying multi-query optimization. Without WAN awareness, sharing executions in a wide-area environment may lead to performance degradation. We have implemented our WAN-aware multi-query optimization in a prototype implementation based on Apache Flink. Experimental evaluation using Twitter traces on a real wide-area system deployment across geo-distributed EC2 data centers shows that our technique is able to achieve 21% higher throughput while saving WAN bandwidth consumption by 33% compared to a WAN-aware, sharing-agnostic system.
Original language | English (US) |
---|---|
Title of host publication | SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing |
Publisher | Association for Computing Machinery, Inc |
Pages | 412-425 |
Number of pages | 14 |
ISBN (Electronic) | 9781450360111 |
DOIs | |
State | Published - Oct 11 2018 |
Event | 2018 ACM Symposium on Cloud Computing, SoCC 2018 - Carlsbad, United States Duration: Oct 11 2018 → Oct 13 2018 |
Publication series
Name | SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing |
---|
Other
Other | 2018 ACM Symposium on Cloud Computing, SoCC 2018 |
---|---|
Country/Territory | United States |
City | Carlsbad |
Period | 10/11/18 → 10/13/18 |
Bibliographical note
Funding Information:The authors would like to thank the anonymous SoCC reviewers for their valuable comments and feedback. The work is supported by grant NSF CNS-1619254 and CNS-1717834.
Publisher Copyright:
© 2018 Association for Computing Machinery.
Keywords
- Execution sharing
- Geo-distributed systems
- Multi-query optimization
- Stream processing systems