Hardware acceleration consists of offloading computational work to devices such as graphics processing units (GPUs) to produce overall speed-up. Algorithms and numerical methods must be constructed to suit the available hardware in order to effectively produce speed-up. In this work a numerical method is presented which can effectively use hardware acceleration to simulate incompressible turbulent fluid flow. The method is an unstructured overset method where unstructured meshes are attached to individual bodies and connected throughout the flow domain to produce a single domain solution through an overset assembly process. The unstructured overset method shown in Horne and Mahesh  and Horne and Mahesh  was found capable of scaling to O(105) computational cores for O(105) moving bodies in turbulent flow fields while producing accurate flow results. This highly scalable method is modified and extended to effectively utilize on-node hardware acceleration. Overset assembly algorithms which use hardware acceleration are presented based on successful accelerated algorithms in real-time ray tracing and computational geometry. Timing results for core overset assembly operations are presented showing a maximum O(100x) speedup when using hardware acceleration. A novel method for turbulent fluid flow is presented which utilizes over-decomposition of the flow domain to produce task-parallelism allowing asynchronous calculation of the different steps of the method while also providing overlap between data transfer and computation. A mixed precision solver is utilized which provides a balance between optimal performance and numerical accuracy. A cost effective and accurate artificial compressibility pressure regularization is used which has minimal memory complexity and minimizes computational cost while maintaining accuracy. A primal-dual Laplacian operator is introduced which produces accurate results on skewed meshes. Results for canonical flow cases with overset meshes are shown illustrating the method's accuracy and numerical properties. Substantial speed-up is demonstrated for the numerical method reaching upwards of 50 times as fast as the non-accelerated method for high cell loadings.
Bibliographical noteFunding Information:
Computing resources from the Minnesota Supercomputing Institute (MSI) are gratefully acknowledged for all results shown in this work. This work was supported by the United States Office of Naval Research under Grants N00014-18-1-2356 and N00014-20-1- 2717 monitored by Drs. Ki-Han Kim and Peter Chang respectively.
© 2021 Elsevier Inc.
- GPU acceleration
- Incompressible fluid flow
- Real-time raytracing
- Turbulent flows
- Unstructured overset method