Graph analytics are widely used in real-world applications, and GPUs are major accelerators for such applications. However, as graph sizes become significantly larger than the capacity of GPU memory, the performance can degrade significantly due to the heavy overhead required in moving a large amount of graph data between CPU main memory and GPU memory. Some existing approaches have tried to exploit data locality and addressed the issues of memory oversubscription on GPUs. However, these approaches have yet to take advantage of the data reuse cross iterations because of the data sizes in most large-graph analytics. In our studies, we have found that in most graph applications the graph traversals exhibit a roughly sequential scan over the graph data with an extremely large memory footprint. Based on the observation, we propose a novel framework, called Ascetic, to exploit temporal locality with very long reuse distances. In Ascetic, the GPU memory is divided into a Static Region and an On-demand Region. The static region can exploit data reuse across iterations. The on-demand region is designed to load the data requested in the iteration of the graph traversal while not found in the static region. We have implemented a prototype of the Ascetic framework and conducted a series of experiments on performance evaluation. The experimental results show that Ascetic can significantly reduce the data transfer overhead, and allow more overlapped execution between GPU and CPU, which leads to an average of 2.0x speedup over a state-of-the-art approach.
|Original language||English (US)|
|Title of host publication||50th International Conference on Parallel Processing, ICPP 2021 - Main Conference Proceedings|
|Publisher||Association for Computing Machinery|
|State||Published - Aug 9 2021|
|Event||50th International Conference on Parallel Processing, ICPP 2021 - Virtual, Online, United States|
Duration: Aug 9 2021 → Aug 12 2021
|Name||ACM International Conference Proceeding Series|
|Conference||50th International Conference on Parallel Processing, ICPP 2021|
|Period||8/9/21 → 8/12/21|
Bibliographical noteFunding Information:
This work is partially supported by the National Key Research and Development Program of China (2018YFB1003405).
© 2021 ACM.
- Data Reuse
- GPU memory oversubscription
- Graph Computing
- Partition-based method