PruneGNN: Algorithm-Architecture Pruning Framework for Graph Neural Network Acceleration

Deniz Gurevin, Mohsin Shan, Shaoyi Huang, Md Amit Hasan, Caiwen Ding, Omer Khan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Performing training and inference for Graph Neural Networks (GNNs) under tight latency constraints has become increasingly difficult as real-world input graphs continue to grow. Compared to traditional DNNs, GNNs present unique computational challenges due to their massive, unstructured, and sparse input graphs. Prior works have applied irregular and structured model pruning techniques to reduce the complexity of GNNs to accelerate GNN performance. However, irregular pruning techniques presented in the literature use floating point operations to estimate G NN performance, which does not reveal the true performance implications of model sparsity caused by the diminished parallelism of sparse matrix multiplication kernels. This paper quantitatively shows that irregular sparsity in G NN models is unable to be exploited to improve performance in parallel architectures that employ highly vectorized hardware. While structured pruning can overcome these issues, the existing structured pruning work for GNNs introduces performance scalability challenges as low-dimensional mapping of the pruned model is unable to exploit the full parallelism potential of the GPU's vectorized hardware. We propose PruneGNN, an optimized algorithm-Architecture framework for structured GNN pruning. At the algorithm level, a dimension-pruning-Aware sparse training method is proposed that achieves high sparsity while maintaining accuracy. At the architecture level, novel SIMD-Aware kernels are proposed that exploit matrix-operator-level parallelism and unlock performance gains with reduced-dimension GNN models. The efficacy of the proposed framework is evaluated for end-To-end inference as well as training performance using real-world dynamic and static graphs on representative GNN models. Experimental results using an NVIDIA A100 GPU show that PruneGNN achieves an average of 2 x speedup over the prior structured pruning work for state-of-The-Art GNN models.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
PublisherIEEE Computer Society
Pages108-123
Number of pages16
ISBN (Electronic)9798350393132
StatePublished - 2024
Externally publishedYes
Event30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024 - Edinburgh, United Kingdom
Duration: Mar 2 2024Mar 6 2024

Publication series

NameProceedings - International Symposium on High-Performance Computer Architecture
ISSN (Print)1530-0897

Conference

Conference30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
Country/TerritoryUnited Kingdom
CityEdinburgh
Period3/2/243/6/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Fingerprint

Dive into the research topics of 'PruneGNN: Algorithm-Architecture Pruning Framework for Graph Neural Network Acceleration'. Together they form a unique fingerprint.

Cite this