Graph partitioning is important in distributing workloads on parallel compute systems, computing sparse matrix re-orderings, and designing VLSI circuits. Refinement algorithms are used to improve existing partitionings, and are essential for obtaining high-quality partitionings. Existing parallel refinement algorithms either extract concurrency by sacrificing in terms of quality, or preserve quality by restricting concurrency. In this work we present a new shared-memory parallel algorithm for refining an existing k-way partitioning that can break out of local minima and produce high-quality partitionings. This allows our algorithm to scale well in terms of the number of processing cores and produce clusterings of quality equal to serial algorithms. Our algorithm achieves speedups of 5.7-16.7&-using 24 cores, while exhibiting only 0.52% higher edgecuts than when run serially. This is 6.3x faster and 1.9% better quality than other parallel refinement algorithms which can break out of local minima.
|Original language||English (US)|
|Title of host publication||Proceedings - 45th International Conference on Parallel Processing, ICPP 2016|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||6|
|State||Published - Sep 21 2016|
|Event||45th International Conference on Parallel Processing, ICPP 2016 - Philadelphia, United States|
Duration: Aug 16 2016 → Aug 19 2016
|Name||Proceedings of the International Conference on Parallel Processing|
|Other||45th International Conference on Parallel Processing, ICPP 2016|
|Period||8/16/16 → 8/19/16|
Bibliographical noteFunding Information:
This work was supported in part by NSF (IIS-0905220, OCI-1048018, CNS-1162405, IIS-1247632, IIP-1414153, IIS-1447788),Army Research Office (W911NF-14-1-0316), Intel Software and Services Group, and the Digital Technology Center at the University of Minnesota.
- Graph partitioning
- Local minima