cuSTINGER – A Sparse Dynamic Graph and Matrix Data Structure Oded Green
Going to talk about 2 things • A scalable and dynamic data structure for graph algorithms and linear algebra based problems • A framework for static and dynamic analytics NVIDIA GTC cuSTINGER Oded Green, 2017 2
Upfront Summary of Results • Can support up to 90 million updates per second • Low overhead in comparison with CSR – Initializing is also relatively in expensive 20% 200% – Equal performance • Great performance for static graph algorithms • Simple to use NVIDIA GTC cuSTINGER Oded Green, 2017 3
Big Data problems need Graph Analysis Communication networks: • World wide connectivity • High velocity changes • Different types of extracted data: – Physical communication network. – Person to person communication network. Financial networks: Health Care networks: • Transactions between • Various players. players. • Pattern matching and • Different transactions epidemic monitoring. types (property graph) • Problem sizes have doubled in last 5 years. Graphs are a unifying motif for data analytics. NVIDIA GTC cuSTINGER Oded Green, 2017 4
STINGER • STINGER: Spatio Temporal Interaction Networks and Graphs (STING) Extensible Representation • Enable algorithm designers to implement dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks not ideal for the GPU • Good performance for all types of graph problems and algorithms static and dynamic. • Assumes globally addressable memory access NVIDIA GTC cuSTINGER Oded Green, 2017 5
STINGER and cuSTINGER Properties ✓ A Simple programming model ✓ Millions of updates per second to graph ✓ Updates are not bottlenecks for analytics. ✓ Advanced memory manager ✓ Transfers data between host and device automatically ✓ Reduces initialization time ✓ Allows for simple update processes STINGER Papers: [Bader et al. ; 2007; Tech Report], [Ediger et al. ; HPEC; 2012], [McColl et al. ; PPAA; 2014] cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuSTINGER: Supporting dynamic graph algorithms for GPUs NVIDIA GTC cuSTINGER Oded Green, 2017 6
Definitions • Dynamic graphs (matrices) – Graph can change over time. – Changes can be to topology, edges, or vertices. • For example new edges between two vertices. – Changes to edge or vertex weights • Streaming graphs: – Graphs changing at high rates. – 100s of thousands of updates per second. NVIDIA GTC cuSTINGER Oded Green, 2017 7
Dynamic graph example • Only a subset of the entire graph… • Dynamic: – At time 𝑢 : • 𝑤 and 𝑥 become friends. • 𝑗𝑜𝑡𝑓𝑠𝑢 _ 𝑓𝑒𝑓 ( 𝑤 , 𝑥 ) 𝑢 : – At time Ƹ • 𝑣 and 𝑤 no longer friends 𝑥 𝑤 • d 𝑓𝑚𝑓𝑢𝑓 𝑓𝑒𝑓 𝑣 , 𝑤 𝑣 • Additional operations include vertex insertions & deletions NVIDIA GTC cuSTINGER Oded Green, 2017 8
“Separation of powers” • Dynamic graph data structure and dynamic graph algorithms are in two different repositories – Easy to integrate with external library – Can also be used with matrices • First part of today’s talk will be on the dynamic data structure NVIDIA GTC cuSTINGER Oded Green, 2017 9
Part 1 – Data Structure cuSTINGER Version 2.0 • Improved initialization time – 100s of time faster than Version 1.0 • New memory manager – Reduces fragmentation – Enables memory reclamation – Offers good memory bounds • Scalable data structure – Can easily grow 1000X its initial size without needing to be re initialized • Faster updates Coming soon…(probably late May) NVIDIA GTC cuSTINGER Oded Green, 2017 10
Restrictions of existing static graph representations Name Pros Cons • • Adjacency Matrix Flexible Limited utilization for sparse data • • Linked lists Flexible Poor locality • Allocation time is costly • • COO (Edge list) unsorted Has some flexibility Poor locality • • Updates are simple Stores both the source and destination • • CSR Uses exact amount of Inflexible memory • Good locality NVIDIA GTC cuSTINGER Oded Green, 2017 11
Compressed Sparse Row (CSR) Pros: • Uses precise storage requirements • Great locality – Good for GPUs • Handful of arrays – Simple to use and manage Cons: • Inflexible. • Network growth Src/Row 0 1 2 3 4 5 6 7 unsupported Offset 0 2 4 7 9 11 13 14 14 • Topology changes Dest./Col. 1 2 0 5 0 3 4 2 6 2 5 1 4 3 unsupported Value 2 5 2 7 4 1 4 1 2 4 1 7 1 2 • Property graphs not supported NVIDIA GTC cuSTINGER Oded Green, 2017 12
Part 1: cuSTINGER – A High Level View U SER I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id 2 2 3 2 2 2 1 0 Used Over allocated space 2 2 4 2 2 2 1 0 BSize Pointer 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Dest./Col. 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Value • Supports updates – Supports edge insertion\deletion and deletion. – Supports vertex insertion\deletion. NVIDIA GTC cuSTINGER Oded Green, 2017 13
cuSTINGER – Property Graph Support U SER I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id 2 2 3 2 2 2 1 0 Used 2 2 4 2 2 2 1 0 BSize Pointer Dest./Col. 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Weigth 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Type Time 1 User 1 User 2 …. • These are optional fields NVIDIA GTC cuSTINGER Oded Green, 2017 14
Challenges • Memory allocations are costly. • Seems like we are suggesting that we need 𝑃 ( 𝑊 ) allocations – Absolutely not. – Our first implementation did this… ouch… NVIDIA GTC cuSTINGER Oded Green, 2017 15
Memory Manager Made up three parts: 1. Vectorized Bit Trees 2. BlockArrays 3. 𝐶 + 𝑈𝑠𝑓𝑓𝑡 of BlockArrays • T HIS IS AN I NTERNAL R EPRESENTATION (H IDDEN FROM USERS ) NVIDIA GTC cuSTINGER Oded Green, 2017 16
BlockArrays • Definition – an array made up of equal size blocks. • Each block can contain an equal number of edges BlockArray (with 4 blocks) 1 2 0 5 2 6 2 5 2 5 2 7 1 2 4 1 Block (with 2 edges) NVIDIA GTC cuSTINGER Oded Green, 2017 17
cuSTINGER – BlockArray allocations U SER I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id 2 2 3 2 2 2 1 0 Used Over allocated space 2 2 4 2 2 2 1 0 BSize Pointer 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Dest./Col. 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Value 𝑪𝑩 𝟏 , 𝟐 𝑪𝑩 𝟐 , 𝟐 𝑪𝑩 𝟐 , 𝟑 𝑪𝑩 𝟑 , 𝟐 available space I NTERNAL R EPRESENTATION . U SER H IDDEN FROM USERS I NTERFACE • Relatively small number of BlockArrays are needed – Exact number not known at compile time (or even at runtime given updates) NVIDIA GTC cuSTINGER Oded Green, 2017 18
Memory Manager Made up three parts: 1. Vectorized Bit Trees – Helps determine which blocks are empty – Key components for efficient memory reclamation 2. BlockArrays 3. 𝐶 + 𝑈𝑠𝑓𝑓𝑡 of BlockArrays NVIDIA GTC cuSTINGER Oded Green, 2017 19
Vec Trees • Each block is either full (0) or empty (1). Next available Vect Tree position 0 1 representation 0 0 1 1 0 0 0 0 0 1 0 1 1 2 0 5 2 6 2 5 1 2 2 6 2 5 2 7 1 2 4 1 2 5 1 2 𝐶𝐵 1 , 1 𝐶𝐵 1 , 1 block Vect Tree implementation Vect Tree implementation 0 0 0 0 0 0 1 1 0 1 0 1 Machine word Machine word (b) Partially Empty (a) Full BlockArray BlockArray NVIDIA GTC cuSTINGER Oded Green, 2017 20
Vec Trees Complexity • Given a BlockArray with 𝐶𝐵 • Storage complexity 𝑃 𝐶𝐵 bits. In practice this is close to 2 ⋅ 𝐶𝐵 bits – Relatively small overhead. • Vec Tree Updates require 𝑃 log 𝐶𝐵 operations NVIDIA GTC cuSTINGER Oded Green, 2017 21
Memory Manager Made up three parts: 1. Vectorized Bit Trees 2. BlockArrays 3. 𝑪 + 𝑼𝒔𝒇𝒇𝒕 of BlockArrays – Responsible for deciding when more memory needs to be allocated NVIDIA GTC cuSTINGER Oded Green, 2017 22
𝑪 + 𝑼𝒔𝒇𝒇𝒕 of BlockArrays • Each block sizes has a different tree. • The KEY of the 𝑪 + 𝑼𝒔𝒇𝒇𝒕 is the number of empty blocks 𝐶 + 𝑈𝑠𝑓𝑓 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 1 edge in a block Array 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 2 edges in a block 0 Log. of block size 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 4 edges in a block 1 B+ Node 2 0 3 4 4 1 4 ... 𝐶𝐵 2 , 2 1 available block 31 NVIDIA GTC cuSTINGER Oded Green, 2017 23
𝑪 + 𝑼𝒔𝒇𝒇𝒕 of BlockArrays • Currently supports adjacency lists with up to 2 31 edges • Can easily support up 2 63 edge blocks!!! 𝐶 + 𝑈𝑠𝑓𝑓 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 1 edge in a block Array 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 2 edges in a block 0 Log. of block size 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 4 edges in a block 1 B+ Node B+ Node 2 2 5 1 0 3 4 0 5 3 7 4 1 1 4 1 4 2 7 6 1 ... 𝐶𝐵 2 , 2 1 available block 𝐶𝐵 2 , 1 0 available blocks 31 NVIDIA GTC cuSTINGER Oded Green, 2017 24
𝑪 + 𝑼𝒔𝒇𝒇𝒕 Properties • A new BlockArray is allocated when all existing BlockArrays are full. • Great for memory utilization. NVIDIA GTC cuSTINGER Oded Green, 2017 25
Recommend
More recommend