Trading timeliness and accuracy in geo-distributed streaming analytics

Benjamin Heintz, Abhishek Chandra, Ramesh K. Sitaraman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

Many applications must ingest rapid data streams and produce analytics results in near-real-time. It is increasingly common for inputs to such applications to originate from geographically distributed sources. The typical infrastructure for processing such geo-distributed streams follows a hubandspoke model, where several edge servers perform partial computation before forwarding results over a wide-area network (WAN) to a central location for final processing. Due to limitedWAN bandwidth, it is not always possible to produce exact results. In such cases, applications must either sacrifice timeliness by allowing delayed-i.e., stale-results, or sacrifice accuracy by allowing some error in final results. In this paper, we focus on windowed grouped aggregation, an important and widely used primitive in streaming analytics, and we study the tradeoff between staleness and error. We present optimal offline algorithms for minimizing staleness under an error constraint and for minimizing error under a staleness constraint. Using these offline algorithms as references, we present practical online algorithms for effectively trading off timeliness and accuracy under bandwidth limitations. Using a workload derived from an analytics service offered by a large commercial CDN, we demonstrate the effectiveness of our techniques through both trace-driven simulation as well as experiments on an Apache Storm-based implementation deployed on Planet-Lab. Our experiments show that our proposed algorithms reduce staleness by 81.8% to 96.6%, and error by 83.4% to 99.1% compared to a practical random sampling/batchingbased aggregation algorithm across a diverse set of aggregation functions.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th ACM Symposium on Cloud Computing, SoCC 2016
EditorsYanlei Diao, Marcos K. Aguilera, Brian Cooper, Yanlei Diao
PublisherAssociation for Computing Machinery, Inc
Pages361-373
Number of pages13
ISBN (Electronic)9781450345255
DOIs
StatePublished - Oct 5 2016
Event7th ACM Symposium on Cloud Computing, SoCC 2016 - Santa Clara, United States
Duration: Oct 5 2016Oct 7 2016

Publication series

NameProceedings of the 7th ACM Symposium on Cloud Computing, SoCC 2016

Other

Other7th ACM Symposium on Cloud Computing, SoCC 2016
CountryUnited States
CitySanta Clara
Period10/5/1610/7/16

Keywords

  • Aggregation
  • Approximation
  • Geo-distributed systems
  • Stream processing

Fingerprint Dive into the research topics of 'Trading timeliness and accuracy in geo-distributed streaming analytics'. Together they form a unique fingerprint.

  • Cite this

    Heintz, B., Chandra, A., & Sitaraman, R. K. (2016). Trading timeliness and accuracy in geo-distributed streaming analytics. In Y. Diao, M. K. Aguilera, B. Cooper, & Y. Diao (Eds.), Proceedings of the 7th ACM Symposium on Cloud Computing, SoCC 2016 (pp. 361-373). (Proceedings of the 7th ACM Symposium on Cloud Computing, SoCC 2016). Association for Computing Machinery, Inc. https://doi.org/10.1145/2987550.2987580