TY - JOUR
T1 - A GPU-based streaming algorithm for high-resolution cloth simulation
AU - Tang, Min
AU - Tong, Ruofeng
AU - Narain, Rahul
AU - Meng, Chang
AU - Manocha, Dinesh
PY - 2013/10
Y1 - 2013/10
N2 - We present a GPU-based streaming algorithm to perform high-resolution and accurate cloth simulation. We map all the components of cloth simulation pipeline, including time integration, collision detection, collision response, and velocity updating to GPU-based kernels and data structures. Our algorithm perform intra-object and inter-object collisions, handles contacts and friction, and is able to accurately simulate folds and wrinkles. We describe the streaming pipeline and address many issues in terms of obtaining high throughput on many-core GPUs. In practice, our algorithm can perform high-fidelity simulation on a cloth mesh with 2M triangles using 3GB of GPU memory. We highlight the parallel performance of our algorithm on three different generations of GPUs. On a high-end NVIDIA Tesla K20c, we observe up to two orders of magnitude performance improvement as compared to a single-threaded CPU-based algorithm, and about one order of magnitude improvement over a 16-core CPU-based parallel implementation.
AB - We present a GPU-based streaming algorithm to perform high-resolution and accurate cloth simulation. We map all the components of cloth simulation pipeline, including time integration, collision detection, collision response, and velocity updating to GPU-based kernels and data structures. Our algorithm perform intra-object and inter-object collisions, handles contacts and friction, and is able to accurately simulate folds and wrinkles. We describe the streaming pipeline and address many issues in terms of obtaining high throughput on many-core GPUs. In practice, our algorithm can perform high-fidelity simulation on a cloth mesh with 2M triangles using 3GB of GPU memory. We highlight the parallel performance of our algorithm on three different generations of GPUs. On a high-end NVIDIA Tesla K20c, we observe up to two orders of magnitude performance improvement as compared to a single-threaded CPU-based algorithm, and about one order of magnitude improvement over a 16-core CPU-based parallel implementation.
UR - http://www.scopus.com/inward/record.url?scp=84888618587&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84888618587&partnerID=8YFLogxK
U2 - 10.1111/cgf.12208
DO - 10.1111/cgf.12208
M3 - Article
AN - SCOPUS:84888618587
SN - 0167-7055
VL - 32
SP - 21
EP - 30
JO - Computer Graphics Forum
JF - Computer Graphics Forum
IS - 7
ER -