TY - JOUR
T1 - A scalable iterative dense linear system solver for multiple right-hand sides in data analytics
AU - Kalantzis, Vassilis
AU - Malossi, A. Cristiano I.
AU - Bekas, Costas
AU - Curioni, Alessandro
AU - Gallopoulos, Efstratios
AU - Saad, Yousef
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2018/5
Y1 - 2018/5
N2 - We describe Parallel-Projection Block Conjugate Gradient (PP-BCG), a distributed iterative solver for the solution of dense and symmetric positive definite linear systems with multiple right-hand sides. In particular, we focus on linear systems appearing in the context of stochastic estimation of the diagonal of the matrix inverse in Uncertainty Quantification. PP-BCG is based on the block Conjugate Gradient algorithm combined with Galerkin projections to accelerate the convergence rate of the solution process of the linear systems. Numerical experiments on massively parallel architectures illustrate the performance of the proposed scheme in terms of efficiency and convergence rate, as well as its effectiveness relative to the (block) Conjugate Gradient and the Cholesky-based ScaLAPACK solver. In particular, on a 4 rack BG/Q with up to 65,536 processor cores using dense matrices of order as high as 524,288 and 800 right-hand sides, PP-BCG can be 2x-3x faster than the aforementioned techniques.
AB - We describe Parallel-Projection Block Conjugate Gradient (PP-BCG), a distributed iterative solver for the solution of dense and symmetric positive definite linear systems with multiple right-hand sides. In particular, we focus on linear systems appearing in the context of stochastic estimation of the diagonal of the matrix inverse in Uncertainty Quantification. PP-BCG is based on the block Conjugate Gradient algorithm combined with Galerkin projections to accelerate the convergence rate of the solution process of the linear systems. Numerical experiments on massively parallel architectures illustrate the performance of the proposed scheme in terms of efficiency and convergence rate, as well as its effectiveness relative to the (block) Conjugate Gradient and the Cholesky-based ScaLAPACK solver. In particular, on a 4 rack BG/Q with up to 65,536 processor cores using dense matrices of order as high as 524,288 and 800 right-hand sides, PP-BCG can be 2x-3x faster than the aforementioned techniques.
KW - (Block) Conjugate Gradient
KW - Deflation
KW - Galerkin projections
KW - Massively parallel architectures
KW - Multiple right-hand sides
UR - http://www.scopus.com/inward/record.url?scp=85041039382&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041039382&partnerID=8YFLogxK
U2 - 10.1016/j.parco.2017.12.005
DO - 10.1016/j.parco.2017.12.005
M3 - Article
AN - SCOPUS:85041039382
SN - 0167-8191
VL - 74
SP - 136
EP - 153
JO - Parallel Computing
JF - Parallel Computing
ER -