BDMPI: Conquering BigData with small clusters using MPI

Dominique La Salle, George Karypis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The problem of processing massive amounts of data on clusters with finite amount of memory has become an important problem facing the parallel/distributed computing community. While MapReduce-style technologies provide an effective means for addressing various problems that fit within the MapReduce paradigm, there are many classes of problems for which this paradigm is ill-suited. In this paper we present a runtime system for traditional MPI programs that enables the efficient and transparent disk-based execution of distributed-memory parallel programs. This system, called BDMPI, leverages the semantics of MPI's API to orchestrate the execution of a large number of MPI processes on much fewer compute nodes, so that the running processes maximize the amount of computation that they perform with the data fetched from the disk. BDMPI enables the development of efficient parallel distributed memory disk-based codes without the high engineering and algorithmic complexities associated with multiple levels of blocking. BDMPI achieves significantly better performance than existing technologies on a single node (GraphChi) as well as on a small cluster (Hadoop).

Original languageEnglish (US)
Title of host publicationProceedings of DISCS 2013
Subtitle of host publicationThe 2013 International Workshop on Data-Intensive Scalable Computing Systems, Held in conjunction with SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherAssociation for Computing Machinery, Inc
Pages19-24
Number of pages6
ISBN (Electronic)9781450325066
DOIs
StatePublished - Nov 18 2013
Event2013 International Workshop on Data-Intensive Scalable Computing Systems, DISCS 2013 - Denver, United States
Duration: Nov 18 2013 → …

Publication series

NameProceedings of DISCS 2013: The 2013 International Workshop on Data-Intensive Scalable Computing Systems, Held in conjunction with SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis

Other

Other2013 International Workshop on Data-Intensive Scalable Computing Systems, DISCS 2013
Country/TerritoryUnited States
CityDenver
Period11/18/13 → …

Keywords

  • Bigdata
  • MPI
  • Out-of-core
  • Parallel processing

Fingerprint

Dive into the research topics of 'BDMPI: Conquering BigData with small clusters using MPI'. Together they form a unique fingerprint.

Cite this