DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training

Hongkuan Zhou, Da Zheng, Xiang Song, George Karypis, Viktor Prasanna

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Memory-based Temporal Graph Neural Networks are powerful tools in dynamic graph representation learning and have demon-strated superior performance in many real-world applications. How-ever, their node memory favors smaller batch sizes to capture more dependencies in graph events and needs to be maintained synchronously across all trainers. As a result, existing frameworks suffer from accuracy loss when scaling to multiple GPUs. Even worse, the tremendous overhead of synchronizing the node memory makes it impractical to deploy the solution in GPU clusters. In this work, we propose DistTGL - an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters. DistTGL has three improvements over existing solutions: an enhanced TGNN model, a novel training algorithm, and an optimized system. In experiments, DistTGL achieves near-linear convergence speedup, outperforming the state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.

Original languageEnglish (US)
Title of host publicationSC 2023 - International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9798400701092
DOIs
StatePublished - 2023
Externally publishedYes
Event2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023 - Denver, United States
Duration: Nov 12 2023Nov 17 2023

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
Country/TerritoryUnited States
CityDenver
Period11/12/2311/17/23

Bibliographical note

Publisher Copyright:
© 2023 ACM.

Keywords

  • Distributed algorithms
  • Neural net-works

Fingerprint

Dive into the research topics of 'DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training'. Together they form a unique fingerprint.

Cite this