Out-of of-GPU-Memory ry Graph Processing Amir Hossein Nodehi - PowerPoint PPT Presentation

Sep 19, 2023 •216 likes •317 views

Su Subway : : Min inimizing Data Transfer during Out-of of-GPU-Memory ry Graph Processing Amir Hossein Nodehi Sabet, Zhijia Zhao, Rajiv Gupta Computer Science and Engineering UC Riverside 1 Background and Motivation GPUs enable massive

Su Subway : : Min inimizing Data Transfer during Out-of of-GPU-Memory ry Graph Processing Amir Hossein Nodehi Sabet, Zhijia Zhao, Rajiv Gupta Computer Science and Engineering UC Riverside 1
Background and Motivation • GPUs enable massive parallelism for graph processing - CuSha [1] - Gunrock [2] - Tigr [3] - … • Graphs can be large and tend to grow over time - Web graphs - Social networks • But GPU memory is limited!! - Out-of-GPU-Memory Graph Processing [1] Khorasani, Farzad, et al. "CuSha: vertex-centric graph processing on GPUs. ” HPDC ’ 14 [2] Wang, Yangzihao, et al. "Gunrock: A high-performance graph processing library on the GPU. ” PPoPP ’ 16 2 [3] Nodehi Sabet, Amir Hossein, Junqiao Qiu, and Zhijia Zhao. "Tigr: Transforming irregular graphs for gpu-friendly graph processing. ” ASPLOS ’ 18
Partition-based Graph Processing Main Memory GPU Memory Transferring Computation 3
A Key Observation Ratio of active vertices (edges) is often low in most iterations Average Ratio of Active Edges across Iterations friendster Uk-2007 Algo. SSSP 9.1% 5.1% BFS 4.1% 0.6% CC 9.8% 3.2% 4
Only Load Active Edges to GPU? GPU Memory Main Memory Too expensive to generate ?! 5
Efficient Subgraph Generation Subway: • a concise subgraph representation , called SubCSR • a highly parallel algorithm for subgraph generation • an efficient GPU-accelerated implementation 6
SubCSR Generation Cost PT (Transfer) Subway-sync (SubCSR + Transfer) 1 Relative Cost 0.9 0.8 0.7 0.6 0.5 0.4 0.3 17% 0.2 3% 0.1 0 FS UK FS UK FS UK SSSP BFS CC Costs: Partitioning-based vs. Subway (subgraph generation) 7
Takeaway Too expensive to dynamically generate subgraphs! Subway Improve performance up to 28X ! 8
Thank you Amir Nodehi : anode001@ucr.edu or on Slack The source code (to be posted soon): https://github.com/AutomataLab/Subway 9

Recommend

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS ARCHITECTURES GPU 0 GPU 1 GPU 2 CPU GPU 0 GPU 1 GPU 2 MEM MEM MEM SYS MEM 2 UNIFIED MEMORY FUNDAMENTALS Single Pointer CPU code GPU code void

870 views • 70 slides

MULTI GPU PROGRAMMING WITH MPI Jiri Kraus, Senior Devtech Compute, April 4th 2016 MPI+CUDA

April 4-7, 2016 | Silicon Valley MULTI GPU PROGRAMMING WITH MPI Jiri Kraus, Senior Devtech Compute, April 4th 2016 MPI+CUDA System System System GDDR5 Memory GDDR5 Memory GDDR5 Memory Memory Memory Memory GPU GPU GPU CPU CPU

1.1k views • 77 slides

MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5

MULTI GPU PROGRAMMING WITH MPI AND OPENACC JIRI KRAUS, NVIDIA MPI+OPENACC System System System GDDR5 Memory GDDR5 Memory GDDR5 Memory Memory Memory Memory GPU GPU GPU CPU CPU CPU PCI-e PCI-e PCI-e Network Network

491 views • 34 slides

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Status of GPU offloading on Wayland Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland How to do GPU offloading 1 GPU offloading with X DRI2 2 GPU offloading with Wayland 3 and XWayland? 4

427 views • 29 slides

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs. CPU Why to Learn About GPU? NVIDIA GPU relative performances Why to Learn About GPU? Hardware Why to Learn About GPU? Interactive rendering

852 views • 46 slides

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Main Focus I. Memory as a process Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory the process by which I. Sensory Memory information is - acquired, II. Short -Term Memory - stored,

169 views • 5 slides

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate (Working memory) Retain (Long term memory) Memory Retrieve (Long term memory) processing A difficulty with any one or more of these skills

361 views • 6 slides

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing Unit (GPGPU) management Today GPU architecture GPU programming model Challenges Real-Time GPU management 2 History GPU

834 views • 66 slides

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team Lead Alexander Soklev, RT GPU R&D Agenda Recent improvements in RT GPU Rounded edges MDL material support Next-gen GPU

534 views • 24 slides

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Mining and Graph Kernels GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan* ^University of Cambridge *IBM T. J. Watson Research Center August 24, 2008 | ACM SIG KDD, Las Vegas Graph Mining and Graph

1.28k views • 60 slides

Hybrid Indexes Huanchen Zhang You are running out of memory 2 You are running out of memory 2

Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes Huanchen Zhang You are running out of memory 2 You are running out of memory 2 ? Buy more You are running out of memory 2 TPC-C on -Store Memory Limit =

632 views • 35 slides

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC Computing Computing + Fabric SoC Memory HYPERCONVERGED Exascale EDGE DEVICE SYSTEM Eliminate data movement via shared

401 views • 11 slides

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory Device Device Memory Computer-Computer Comm CPU CPU CPU CPU Comm Comm Comm Comm Memory Memory Memory Memory Device Device Device Device

629 views • 36 slides

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU hardware and system architecture GPU programming languages GPU programming paradigms This week Example program Memory model

750 views • 36 slides

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower

UNIFIED MEMORY IN CUDA 6 MARK HARRIS NVIDIA CONFIDENTIAL Unified Memory Dramatically Lower Developer Effort Developer View Today Developer View With Unified Memory System GPU Memory Unified Memory Memory Super Simplified Memory Management

766 views • 22 slides

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying General Graph Processing Offline analytics Online querying 2 / 75 Graph Data are Very Common Internet 3 / 75 Graph Data are Very Common Social

986 views • 75 slides

Near-Memory Processing: Its the SW and HW, stupid! Boris Grot www.inf.ed.ac.uk DATE 2019 The

Near-Memory Processing: Its the SW and HW, stupid! Boris Grot www.inf.ed.ac.uk DATE 2019 The End is Near Here! Where do we go from here? An exponential is ending 10%, 20%, .. improvement in performance of component X wont get you far

497 views • 36 slides

Making Middleboxes Someone Elses Problem: Network Processing as a Cloud Service Justine

Making Middleboxes Someone Elses Problem: Network Processing as a Cloud Service Justine Sherry*, Shaddi Hasan*, Colin Scott*, Arvind Krishnamurthy , Sylvia Ratnasamy*, and Vyas Sekar * Typical Enterprise Networks Internet

904 views • 52 slides

Imagine: Media Processing with Streams Brucek Khailany et al. and a little bit of Evaluating

Imagine: Media Processing with Streams Brucek Khailany et al. and a little bit of Evaluating the Imagine Stream Architecture Jung Ho Ahn et al. Presented by Dan Amelang Background Digital media processing has become pervasive

177 views • 14 slides

Processing Data from Files n So far: n Inputs : n from user n

Processing Data from Files n So far: n Inputs : n from user n "hard-wired" into program n Outputs : n "printing" on the screen n In practice, usually: n Input from file n Output to file 1

511 views • 15 slides

Language What is Processing Object Complete orientated with IDE Open source Java Based Can

James Brooks and Chris Tacon Processing Language What is Processing Object Complete orientated with IDE Open source Java Based Can run on Develop Windows, Advanced Mac or Linux Visualisations History Developed in 2001 by Casey

666 views • 25 slides

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1 0 1 2 3 I 1 I 2 0, 1 2, 3 S 1 S 2 SS 1.1 SS 1.1 SS 1.2 1 2 0,1 3 SS 2.1 SS 2.2 3 0 3 2 2,3 1

518 views • 24 slides

Freedom of Information Act Advisory Committee March 20, 2019 1 A Snapshot of FOIA

Freedom of Information Act Advisory Committee March 20, 2019 1 A Snapshot of FOIA Administration Khaldoun AbouAssi American University Tina Nabatchi Syracuse University 2 Freedom of Information Act Advisory Committee March 20, 2019 3

608 views • 25 slides

Making Big Data Processing Simple with Spark Matei Zaharia December 17, 2015 What is Apache

Making Big Data Processing Simple with Spark Matei Zaharia December 17, 2015 What is Apache Spark? Fast and general cluster computing engine that generalizes the MapReduce model Makes it easy and fast to process large datasets High-level

667 views • 32 slides