Abstract
Performing training and inference for Graph Neural Networks (GNNs) under tight latency constraints has become increasingly difficult as real-world input graphs continue to grow. Compared to traditional DNNs, GNNs present unique computational challenges due to their massive, unstructured, and sparse input graphs. Prior works have applied irregular and structured model pruning techniques to reduce the complexity of GNNs to accelerate GNN performance. However, irregular pruning techniques presented in the literature use floating point operations to estimate G NN performance, which does not reveal the true performance implications of model sparsity caused by the diminished parallelism of sparse matrix multiplication kernels. This paper quantitatively shows that irregular sparsity in G NN models is unable to be exploited to improve performance in parallel architectures that employ highly vectorized hardware. While structured pruning can overcome these issues, the existing structured pruning work for GNNs introduces performance scalability challenges as low-dimensional mapping of the pruned model is unable to exploit the full parallelism potential of the GPU's vectorized hardware. We propose PruneGNN, an optimized algorithm-Architecture framework for structured GNN pruning. At the algorithm level, a dimension-pruning-Aware sparse training method is proposed that achieves high sparsity while maintaining accuracy. At the architecture level, novel SIMD-Aware kernels are proposed that exploit matrix-operator-level parallelism and unlock performance gains with reduced-dimension GNN models. The efficacy of the proposed framework is evaluated for end-To-end inference as well as training performance using real-world dynamic and static graphs on representative GNN models. Experimental results using an NVIDIA A100 GPU show that PruneGNN achieves an average of 2 x speedup over the prior structured pruning work for state-of-The-Art GNN models.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024 |
Publisher | IEEE Computer Society |
Pages | 108-123 |
Number of pages | 16 |
ISBN (Electronic) | 9798350393132 |
State | Published - 2024 |
Externally published | Yes |
Event | 30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024 - Edinburgh, United Kingdom Duration: Mar 2 2024 → Mar 6 2024 |
Publication series
Name | Proceedings - International Symposium on High-Performance Computer Architecture |
---|---|
ISSN (Print) | 1530-0897 |
Conference
Conference | 30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024 |
---|---|
Country/Territory | United Kingdom |
City | Edinburgh |
Period | 3/2/24 → 3/6/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.