Sparse matrices are common in scientific computing and machine learning. By storing and processing only the non-zero elements of a matrix containing mostly zeros, sparse matrix algorithms often reduce computation and storage requirements of operations by an order of complexity. The order of the rows and columns of the matrix can have a significant impact on the efficiency of sparse direct methods. For example, in a Cholesky decomposition, it is desirable to re-order the input matrix so as to reduce the number of non-zeros in the factors. One of the most effective methods for re-ordering is nested dissection, where vertex separators are recursively found in the graph representation of the matrix and are used to permute the rows and columns. In this work we investigate the creation of vertex separators on shared memory parallel architectures and their use in nested dissection. We introduce a new effective scheme for refining a vertex separator in parallel, and a specialized parallel task scheduling scheme for the nested dissection problem. These algorithms have been implemented in the mt-Metis framework. Our experiments show that mt-Metis is 1.5× faster than ParMetis while producing orderings with 3.7% fewer non-zeros and 14.0% fewer operations.
|Original language||English (US)|
|Title of host publication||Euro-Par 2015|
|Subtitle of host publication||Parallel Processing - 21st International Conference on Parallel and Distributed Computing, Proceedings|
|Editors||Jesper Larsson Traff, Sascha Hunold, Francesco Versaci|
|Number of pages||12|
|State||Published - 2015|
|Event||21st International Conference on Parallel and Distributed Computing, Euro-Par 2015 - Vienna, Austria|
Duration: Aug 24 2015 → Aug 28 2015
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Other||21st International Conference on Parallel and Distributed Computing, Euro-Par 2015|
|Period||8/24/15 → 8/28/15|
Bibliographical noteFunding Information:
This work was supported in part by NSF (IIS-0905220, OCI-1048018, CNS-1162405, IIS-1247632, IIP-1414153, IIS-1447788), Army Research Office (W911NF-14-1-0316), Intel Software and Services Group, and the Digital Technology Center at the University of Minnesota. Access to research and computing facilities was provided by the Digital Technology Center and the Minnesota Supercomputing Institute.
© Springer-Verlag Berlin Heidelberg 2015.