Efficient mapping and implementation of matrix algorithms on a hypercube

Research output: Contribution to journalArticlepeer-review

9 Scopus citations


It is well known that parallelism by itself does not lead to higher speeds. This study shows how to put parallelism to best use, that is, how to find an optimal balance between communication and computation overheads for two parallel matrix algorithms. The problem graph for matrix algorithms analyzed in this paper is a two-dimensional grid (toroidal mesh) which is mapped onto a hypercube topology. To perform matrix operations on a hypercube, a matrix is partitioned into several submatrices which are stored and manipulated in the nodes. We seek to find an optimal matrix partitioning to minimize overall execution time. The NCUBE parallel machine is used for experimental performance evaluation. For matrix multiplication, we derive an exact analytical model to determine the optimal partitioning size and perform its experimental verification on the NCUBE parallel processor. For a parallel Gaussian elimination known as the balanced algorithm, we present performance measurements and an approximate analytical model for performance evaluation. Our analyses show that the optimal submatrix size is typically small and does not depend on the original matrix size.

Original languageEnglish (US)
Pages (from-to)7-27
Number of pages21
JournalThe Journal of Supercomputing
Issue number1
StatePublished - Sep 1 1988


Dive into the research topics of 'Efficient mapping and implementation of matrix algorithms on a hypercube'. Together they form a unique fingerprint.

Cite this