Skip to main navigation Skip to search Skip to main content

Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Consider massive matrix multiplication, a problem that underlies many data analytic applications, in a large-scale distributed system comprising a group of workers. We target the stragglers' delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named entangled polynomial code, designing intermediate computations at the workers in order to minimize the recovery threshold (i.e., the number of workers that we need to wait for in order to compute the final output). We prove the optimality of entangled polynomial code in several cases, and show that it provides order-wise improvement over the conventional schemes for straggler mitigation. Furthermore, we characterize the optimal recovery threshold among all linear coding strategies within a factor of 2 using bilinear complexity, by developing an improved version of the entangled polynomial code.

Original languageEnglish (US)
Title of host publication2018 IEEE International Symposium on Information Theory, ISIT 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2022-2026
Number of pages5
ISBN (Print)9781538647806
DOIs
StatePublished - Aug 15 2018
Externally publishedYes
Event2018 IEEE International Symposium on Information Theory, ISIT 2018 - Vail, United States
Duration: Jun 17 2018Jun 22 2018

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
Volume2018-June
ISSN (Print)2157-8095

Other

Other2018 IEEE International Symposium on Information Theory, ISIT 2018
Country/TerritoryUnited States
CityVail
Period6/17/186/22/18

Bibliographical note

Publisher Copyright:
© 2018 IEEE.

Fingerprint

Dive into the research topics of 'Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding'. Together they form a unique fingerprint.

Cite this