In this paper we explore the design space of creating a multi-threaded graph partitioner. We present and compare multiple approaches for parallelizing each of the three phases of multilevel graph partitioning: coarsening, initial partitioning, and uncoarsening. We also explore the differences in thread lifetimes and data ownership in this context. We show that despite the options for fine-grain synchronization and task decomposition offered by current threading technologies, the best performance is achieved by preserving data ownership and minimizing synchronization. In addition to this we also presentan unprotected approach to generating a vertex matching in parallel with little overhead. We use these findings to develop an OpenMP based implementation of the Metis algorithms and compare it against MPI based partitioners on three different multi-core architectures. Our multi-threaded implementation not only achieves greater than a factor of two speedup over the other partitioners, but also uses significantly less memory.
|Original language||English (US)|
|Number of pages||12|
|State||Published - Oct 7 2013|
|Event||27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013 - Boston, MA, United States|
Duration: May 20 2013 → May 24 2013
|Other||27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013|
|Period||5/20/13 → 5/24/13|