TY - JOUR

T1 - A scalable iterative dense linear system solver for multiple right-hand sides in data analytics

AU - Kalantzis, Vassilis

AU - Malossi, A. Cristiano I.

AU - Bekas, Costas

AU - Curioni, Alessandro

AU - Gallopoulos, Efstratios

AU - Saad, Yousef

N1 - Publisher Copyright:
© 2018 Elsevier B.V.

PY - 2018/5

Y1 - 2018/5

N2 - We describe Parallel-Projection Block Conjugate Gradient (PP-BCG), a distributed iterative solver for the solution of dense and symmetric positive definite linear systems with multiple right-hand sides. In particular, we focus on linear systems appearing in the context of stochastic estimation of the diagonal of the matrix inverse in Uncertainty Quantification. PP-BCG is based on the block Conjugate Gradient algorithm combined with Galerkin projections to accelerate the convergence rate of the solution process of the linear systems. Numerical experiments on massively parallel architectures illustrate the performance of the proposed scheme in terms of efficiency and convergence rate, as well as its effectiveness relative to the (block) Conjugate Gradient and the Cholesky-based ScaLAPACK solver. In particular, on a 4 rack BG/Q with up to 65,536 processor cores using dense matrices of order as high as 524,288 and 800 right-hand sides, PP-BCG can be 2x-3x faster than the aforementioned techniques.

AB - We describe Parallel-Projection Block Conjugate Gradient (PP-BCG), a distributed iterative solver for the solution of dense and symmetric positive definite linear systems with multiple right-hand sides. In particular, we focus on linear systems appearing in the context of stochastic estimation of the diagonal of the matrix inverse in Uncertainty Quantification. PP-BCG is based on the block Conjugate Gradient algorithm combined with Galerkin projections to accelerate the convergence rate of the solution process of the linear systems. Numerical experiments on massively parallel architectures illustrate the performance of the proposed scheme in terms of efficiency and convergence rate, as well as its effectiveness relative to the (block) Conjugate Gradient and the Cholesky-based ScaLAPACK solver. In particular, on a 4 rack BG/Q with up to 65,536 processor cores using dense matrices of order as high as 524,288 and 800 right-hand sides, PP-BCG can be 2x-3x faster than the aforementioned techniques.

KW - (Block) Conjugate Gradient

KW - Deflation

KW - Galerkin projections

KW - Massively parallel architectures

KW - Multiple right-hand sides

UR - http://www.scopus.com/inward/record.url?scp=85041039382&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041039382&partnerID=8YFLogxK

U2 - 10.1016/j.parco.2017.12.005

DO - 10.1016/j.parco.2017.12.005

M3 - Article

AN - SCOPUS:85041039382

SN - 0167-8191

VL - 74

SP - 136

EP - 153

JO - Parallel Computing

JF - Parallel Computing

ER -