TY - GEN
T1 - Accelerating lattice boltzmann fluid flow simulations using graphics processors
AU - Bailey, Peter
AU - Myre, Joe
AU - Walsh, Stuart D.C.
AU - Lilja, David J
AU - Saar, Martin O.
PY - 2009
Y1 - 2009
N2 - Lattice Boltzmann Methods (LBM) are used for the computational simulation of Newtonian fluid dynamics. LBM-based simulations are readily parallelizable; they have been implemented on general-purpose processors [1][2][3], field-programmable gate arrays (FPGAs) [4], and graphics processing units (GPUs) [5][6][7]. Of the three methods, the GPU implementations achieved the highest simulation performance per chip. With memory bandwidth of up to 141 GB/s and a theoretical maximum floating point performance of over 600 GFLOPS [8], CUDA-ready GPUs from NVIDIA provide an attractive platform for a wide range of scientific simulations, including LBM. This paper improves upon prior single-precision GPU LBM results for the D3Q19 model [7] by increasing GPU multiprocessor occupancy, resulting in an increase in maximum performance by 20%, and by introducing a space-efficient storage method which reduces GPU RAM requirements by 50% at a slight detriment to performance. Both GPU implementations are over 28 times faster than a singleprecision quad-core CPU version utilizing OpenMP.
AB - Lattice Boltzmann Methods (LBM) are used for the computational simulation of Newtonian fluid dynamics. LBM-based simulations are readily parallelizable; they have been implemented on general-purpose processors [1][2][3], field-programmable gate arrays (FPGAs) [4], and graphics processing units (GPUs) [5][6][7]. Of the three methods, the GPU implementations achieved the highest simulation performance per chip. With memory bandwidth of up to 141 GB/s and a theoretical maximum floating point performance of over 600 GFLOPS [8], CUDA-ready GPUs from NVIDIA provide an attractive platform for a wide range of scientific simulations, including LBM. This paper improves upon prior single-precision GPU LBM results for the D3Q19 model [7] by increasing GPU multiprocessor occupancy, resulting in an increase in maximum performance by 20%, and by introducing a space-efficient storage method which reduces GPU RAM requirements by 50% at a slight detriment to performance. Both GPU implementations are over 28 times faster than a singleprecision quad-core CPU version utilizing OpenMP.
UR - http://www.scopus.com/inward/record.url?scp=77951435761&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951435761&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2009.38
DO - 10.1109/ICPP.2009.38
M3 - Conference contribution
AN - SCOPUS:77951435761
SN - 9780769538020
T3 - Proceedings of the International Conference on Parallel Processing
SP - 550
EP - 557
BT - ICPP-2009 - The 38th International Conference on Parallel Processing
T2 - 38th International Conference on Parallel Processing, ICPP-2009
Y2 - 22 September 2009 through 25 September 2009
ER -