Coded TeraSort

Songze Li, Sucha Supittayapornpong, Mohammad Ali Maddah-Ali, Salman Avestimehr

Research output: Chapter in Book/Report/Conference proceedingConference contribution

28 Scopus citations

Abstract

We focus on sorting, which is the building block of many machine learning algorithms, and propose a novel distributed sorting algorithm, named CodedTeraSort, which substantially improves the execution time of the TeraSort benchmark in Hadoop MapReduce. The key idea of CodedTeraSort is to impose structured redundancy in data, in order to enable in-network coding opportunities that overcome the data shuffling bottleneck of TeraSort. We empirically evaluate the performance of CodedTeraSort algorithm on Amazon EC2 clusters, and demonstrate that it achieves 1.97× - 3.39× speedup, compared with TeraSort, for typical settings of interest.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages389-398
Number of pages10
ISBN (Electronic)9781538634080
DOIs
StatePublished - Jun 30 2017
Externally publishedYes
Event31st IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017 - Orlando, United States
Duration: May 29 2017Jun 2 2017

Publication series

NameProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017

Conference

Conference31st IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
Country/TerritoryUnited States
CityOrlando
Period5/29/176/2/17

Bibliographical note

Funding Information:
VII. ACKNOWLEDGEMENT This work is in part supported by NSF grants CAREER 1408639 and NETS-1419632, ONR award N000141612189, NSA award, and funds from Intel.

Publisher Copyright:
© 2017 IEEE.

Keywords

  • Coding
  • Data Shuffling
  • Distributed Computing
  • Machine Learning
  • MapReduce
  • Sorting

Fingerprint

Dive into the research topics of 'Coded TeraSort'. Together they form a unique fingerprint.

Cite this