Utilizing dynamic parallelism in CUDA to accelerate a 3D red-black successive over relaxation wind-field solver

Behnam Bozorgmehr, Pete Willemsen, Jeremy A. Gibbs, Rob Stoll, Jae Jin Kim, Eric R. Pardyjak

Research output: Contribution to journalArticlepeer-review

10 Scopus citations


QES-Winds is a fast-response wind modeling platform for simulating high-resolution mean wind fields for optimization and prediction. The code uses a variational analysis technique to solve the Poisson equation for Lagrange multipliers to obtain a mean wind field and GPU parallelization to accelerate the numerical solution of the Poisson equation. QES-Winds benefits from CUDA dynamic parallelism (launching the kernel from the GPU) to speed up calculations by a factor of 128 compared to the serial solver for a domain with 145 million cells. The dynamic parallelism enables QES-Winds to calculate mean velocity fields for domains with sizes of 10km2 and horizontal resolutions of 1-3m in under 1 min. As a result, QES-Winds is a numerical code suitable for computing high-resolution wind fields on large domains in real time, which can be used to model a wide range of real-world problems including wildfires and urban air quality.

Original languageEnglish (US)
Article number104958
JournalEnvironmental Modelling and Software
StatePublished - Mar 1 2021

Bibliographical note

Funding Information:
This work was partly supported by a grant from the National Institute of Environment Research (NIER), funded by the Ministry of Environment (MOE) of the Republic of Korea (NIER-SP2019-312), the United States Department of Agriculture National Institute for Food and Agriculture Specialty Crop Research Initiative Award No. 2018?03375 and the United States Department of Agriculture Agricultural Research Service through Research Support Agreement 58-2072-0-036.

Publisher Copyright:
© 2021 Elsevier Ltd


  • Fast-response
  • Iterative method
  • Poisson equation
  • QES-Winds
  • Staggered grid
  • Wind modeling


Dive into the research topics of 'Utilizing dynamic parallelism in CUDA to accelerate a 3D red-black successive over relaxation wind-field solver'. Together they form a unique fingerprint.

Cite this