TY - JOUR
T1 - Liberator
T2 - A Data Reuse Framework for Out-of-Memory Graph Computing on GPUs
AU - Li, Shiyang
AU - Tang, Ruiqi
AU - Zhu, Jingyu
AU - Zhao, Ziyi
AU - Gong, Xiaoli
AU - Wang, Wenwen
AU - Zhang, Jin
AU - Yew, Pen Chung
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - Graph analytics are widely used including recommender systems, scientific computing, and data mining. Meanwhile, GPU has become the major accelerator for such applications. However, the graph size increases rapidly and often exceeds the GPU memory, incurring severe performance degradation due to frequent data transfers between the main memory and GPUs. To relieve this problem, we focus on the utilization of data in GPUs by taking advantage of the data reuse across iterations. In our studies, we deeply analyze the memory access patterns of graph applications at different granularities. We have found that the memory footprint is accessed with a roughly sequential scan without a hotspot, which infers an extremely long reuse distance. Based on our observation, we propose a novel framework, called Liberator, to exploit the data reuse within GPU memory. In Liberator, GPU memory is reserved for the data potentially accessed across iterations to avoid excessive data transfer between the main memory and GPUs. For the data not existing in GPU memory, a Merged and Aligned memory access manner is employed to improve the transmission efficiency. We also further optimize the framework by parallel processing of data in GPU memory and data in the main memory. We have implemented a prototype of the Liberator framework and conducted a series of experiments on performance evaluation. The experimental results show that Liberator can significantly reduce the data transfer overhead, which achieves an average of 2.7x speedup over a state-of-the-art approach.
AB - Graph analytics are widely used including recommender systems, scientific computing, and data mining. Meanwhile, GPU has become the major accelerator for such applications. However, the graph size increases rapidly and often exceeds the GPU memory, incurring severe performance degradation due to frequent data transfers between the main memory and GPUs. To relieve this problem, we focus on the utilization of data in GPUs by taking advantage of the data reuse across iterations. In our studies, we deeply analyze the memory access patterns of graph applications at different granularities. We have found that the memory footprint is accessed with a roughly sequential scan without a hotspot, which infers an extremely long reuse distance. Based on our observation, we propose a novel framework, called Liberator, to exploit the data reuse within GPU memory. In Liberator, GPU memory is reserved for the data potentially accessed across iterations to avoid excessive data transfer between the main memory and GPUs. For the data not existing in GPU memory, a Merged and Aligned memory access manner is employed to improve the transmission efficiency. We also further optimize the framework by parallel processing of data in GPU memory and data in the main memory. We have implemented a prototype of the Liberator framework and conducted a series of experiments on performance evaluation. The experimental results show that Liberator can significantly reduce the data transfer overhead, which achieves an average of 2.7x speedup over a state-of-the-art approach.
KW - Data reuse
KW - GPU memory oversubscription
KW - graph computing
KW - partition-based method
KW - zero-copy
UR - http://www.scopus.com/inward/record.url?scp=85159716582&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159716582&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2023.3268662
DO - 10.1109/TPDS.2023.3268662
M3 - Article
AN - SCOPUS:85159716582
SN - 1045-9219
VL - 34
SP - 1954
EP - 1967
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 6
ER -