Queueing with redundant requests: exact analysis

Kristen Gardner, Samuel Zbarsky, Sherwin Doroudi, Mor Harchol-Balter, Esa Hyytiä, Alan Scheller-Wolf

Research output: Contribution to journalArticlepeer-review

44 Scopus citations


Recent computer systems research has proposed using redundant requests to reduce latency. The idea is to run a request on multiple servers and wait for the first completion (discarding all remaining copies of the request). However, there is no exact analysis of systems with redundancy. This paper presents the first exact analysis of systems with redundancy. We allow for any number of classes of redundant requests, any number of classes of non-redundant requests, any degree of redundancy, and any number of heterogeneous servers. In all cases we derive the limiting distribution of the state of the system. In small (two or three server) systems, we derive simple forms for the distribution of response time of both the redundant classes and non-redundant classes, and we quantify the “gain” to redundant classes and “pain” to non-redundant classes caused by redundancy. We find some surprising results. First, the response time of a fully redundant class follows a simple exponential distribution and that of the non-redundant class follows a generalized hyperexponential. Second, fully redundant classes are “immune” to any pain caused by other classes becoming redundant. We also compare redundancy with other approaches for reducing latency, such as optimal probabilistic splitting of a class among servers (Opt-Split) and join-the-shortest-queue (JSQ) routing of a class. We find that, in many cases, redundancy outperforms JSQ and Opt-Split with respect to overall response time, making it an attractive solution.

Original languageEnglish (US)
Pages (from-to)227-259
Number of pages33
JournalQueueing Systems
Issue number3-4
StatePublished - Aug 1 2016
Externally publishedYes

Bibliographical note

Funding Information:
This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1252522; was funded by NSF-CMMI-1334194, NSF-CSR-1116282, and NSF-CMMI-1538204, by the Intel Science and Technology Center for Cloud Computing, and by a Google Faculty Research Award 2015/16; and has been supported by the Academy of Finland in FQ4BD and TOP-Energy projects (Grant Nos. 296206 and 268992).

Publisher Copyright:
© 2016, Springer Science+Business Media New York.


  • Dispatching
  • Markov chain analysis
  • Redundancy


Dive into the research topics of 'Queueing with redundant requests: exact analysis'. Together they form a unique fingerprint.

Cite this