Su Subway : : Min inimizing Data Transfer during Out-of of-GPU-Memory ry Graph Processing Amir Hossein Nodehi Sabet, Zhijia Zhao, Rajiv Gupta Computer Science and Engineering UC Riverside 1
Background and Motivation • GPUs enable massive parallelism for graph processing - CuSha [1] - Gunrock [2] - Tigr [3] - … • Graphs can be large and tend to grow over time - Web graphs - Social networks • But GPU memory is limited!! - Out-of-GPU-Memory Graph Processing [1] Khorasani, Farzad, et al. "CuSha: vertex-centric graph processing on GPUs. ” HPDC ’ 14 [2] Wang, Yangzihao, et al. "Gunrock: A high-performance graph processing library on the GPU. ” PPoPP ’ 16 2 [3] Nodehi Sabet, Amir Hossein, Junqiao Qiu, and Zhijia Zhao. "Tigr: Transforming irregular graphs for gpu-friendly graph processing. ” ASPLOS ’ 18
Partition-based Graph Processing Main Memory GPU Memory Transferring Computation 3
A Key Observation Ratio of active vertices (edges) is often low in most iterations Average Ratio of Active Edges across Iterations friendster Uk-2007 Algo. SSSP 9.1% 5.1% BFS 4.1% 0.6% CC 9.8% 3.2% 4
Only Load Active Edges to GPU? GPU Memory Main Memory Too expensive to generate ?! 5
Efficient Subgraph Generation Subway: • a concise subgraph representation , called SubCSR • a highly parallel algorithm for subgraph generation • an efficient GPU-accelerated implementation 6
SubCSR Generation Cost PT (Transfer) Subway-sync (SubCSR + Transfer) 1 Relative Cost 0.9 0.8 0.7 0.6 0.5 0.4 0.3 17% 0.2 3% 0.1 0 FS UK FS UK FS UK SSSP BFS CC Costs: Partitioning-based vs. Subway (subgraph generation) 7
Takeaway Too expensive to dynamically generate subgraphs! Subway Improve performance up to 28X ! 8
Thank you Amir Nodehi : anode001@ucr.edu or on Slack The source code (to be posted soon): https://github.com/AutomataLab/Subway 9
Recommend
More recommend