oded green going to talk about 2 things
play

Oded Green Going to talk about 2 things A scalable and dynamic data - PowerPoint PPT Presentation

cuSTINGER A Sparse Dynamic Graph and Matrix Data Structure Oded Green Going to talk about 2 things A scalable and dynamic data structure for graph algorithms and linear algebra based problems A framework for static and dynamic


  1. cuSTINGER – A Sparse Dynamic Graph and Matrix Data Structure Oded Green

  2. Going to talk about 2 things • A scalable and dynamic data structure for graph algorithms and linear algebra based problems • A framework for static and dynamic analytics NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 2

  3. Upfront Summary of Results • Can support up ­ to 90 million updates per second • Low overhead in comparison with CSR – Initializing is also relatively in ­ expensive 20% ­ 200% – Equal performance • Great performance for static graph algorithms • Simple to use NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 3

  4. Big Data problems need Graph Analysis Communication networks: • World ­ wide connectivity • High velocity changes • Different types of extracted data: – Physical communication network. – Person ­ to ­ person communication network. Financial networks: Health ­ Care networks: • Transactions between • Various players. players. • Pattern matching and • Different transactions epidemic monitoring. types (property graph) • Problem sizes have doubled in last 5 years. Graphs are a unifying motif for data analytics. NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 4

  5. STINGER • STINGER: Spatio ­ Temporal Interaction Networks and Graphs (STING) Extensible Representation • Enable algorithm designers to implement dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks not ideal for the GPU • Good performance for all types of graph problems and algorithms ­ static and dynamic. • Assumes globally addressable memory access NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 5

  6. STINGER and cuSTINGER Properties ✓ A Simple programming model ✓ Millions of updates per second to graph ✓ Updates are not bottlenecks for analytics. ✓ Advanced memory manager ✓ Transfers data between host and device automatically ✓ Reduces initialization time ✓ Allows for simple update processes STINGER Papers: [Bader et al. ; 2007; Tech Report], [Ediger et al. ; HPEC; 2012], [McColl et al. ; PPAA; 2014] cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuSTINGER: Supporting dynamic graph algorithms for GPUs NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 6

  7. Definitions • Dynamic graphs (matrices) – Graph can change over time. – Changes can be to topology, edges, or vertices. • For example new edges between two vertices. – Changes to edge or vertex weights • Streaming graphs: – Graphs changing at high rates. – 100s of thousands of updates per second. NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 7

  8. Dynamic graph example • Only a subset of the entire graph… • Dynamic: – At time 𝑢 : • 𝑤 and 𝑥 become friends. • 𝑗𝑜𝑡𝑓𝑠𝑢 _ 𝑓𝑒𝑕𝑓 ( 𝑤 , 𝑥 ) 𝑢 : – At time Ƹ • 𝑣 and 𝑤 no longer friends 𝑥 𝑤 • d 𝑓𝑚𝑓𝑢𝑓 𝑓𝑒𝑕𝑓 𝑣 , 𝑤 𝑣 • Additional operations include vertex insertions & deletions NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 8

  9. “Separation of powers” • Dynamic graph data structure and dynamic graph algorithms are in two different repositories – Easy to integrate with external library – Can also be used with matrices • First part of today’s talk will be on the dynamic data structure NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 9

  10. Part 1 – Data Structure cuSTINGER Version 2.0 • Improved initialization time – 100s of time faster than Version 1.0 • New memory manager – Reduces fragmentation – Enables memory reclamation – Offers good memory bounds • Scalable data structure – Can easily grow 1000X its initial size without needing to be re ­ initialized • Faster updates Coming soon…(probably late May) NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 10

  11. Restrictions of existing static graph representations Name Pros Cons • • Adjacency Matrix Flexible Limited utilization for sparse data • • Linked lists Flexible Poor locality • Allocation time is costly • • COO (Edge list) ­ unsorted Has some flexibility Poor locality • • Updates are simple Stores both the source and destination • • CSR Uses exact amount of Inflexible memory • Good locality NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 11

  12. Compressed Sparse Row (CSR) Pros: • Uses precise storage requirements • Great locality – Good for GPUs • Handful of arrays – Simple to use and manage Cons: • Inflexible. • Network growth Src/Row 0 1 2 3 4 5 6 7 unsupported Offset 0 2 4 7 9 11 13 14 14 • Topology changes Dest./Col. 1 2 0 5 0 3 4 2 6 2 5 1 4 3 unsupported Value 2 5 2 7 4 1 4 1 2 4 1 7 1 2 • Property graphs not supported NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 12

  13. Part 1: cuSTINGER – A High Level View U SER ­ I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id 2 2 3 2 2 2 1 0 Used Over ­ allocated space 2 2 4 2 2 2 1 0 BSize Pointer 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Dest./Col. 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Value • Supports updates – Supports edge insertion\deletion and deletion. – Supports vertex insertion\deletion. NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 13

  14. cuSTINGER – Property Graph Support U SER ­ I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id 2 2 3 2 2 2 1 0 Used 2 2 4 2 2 2 1 0 BSize Pointer Dest./Col. 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Weigth 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Type Time 1 User 1 User 2 …. • These are optional fields NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 14

  15. Challenges • Memory allocations are costly. • Seems like we are suggesting that we need 𝑃 ( 𝑊 ) allocations – Absolutely not. – Our first implementation did this… ouch… NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 15

  16. Memory Manager Made up three parts: 1. Vectorized Bit Trees 2. BlockArrays 3. 𝐶 + 𝑈𝑠𝑓𝑓𝑡 of BlockArrays • T HIS IS AN I NTERNAL R EPRESENTATION (H IDDEN FROM USERS ) NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 16

  17. BlockArrays • Definition – an array made up of equal size blocks. • Each block can contain an equal number of edges BlockArray (with 4 blocks) 1 2 0 5 2 6 2 5 2 5 2 7 1 2 4 1 Block (with 2 edges) NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 17

  18. cuSTINGER – BlockArray allocations U SER ­ I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id 2 2 3 2 2 2 1 0 Used Over ­ allocated space 2 2 4 2 2 2 1 0 BSize Pointer 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Dest./Col. 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Value 𝑪𝑩 𝟏 , 𝟐 𝑪𝑩 𝟐 , 𝟐 𝑪𝑩 𝟐 , 𝟑 𝑪𝑩 𝟑 , 𝟐 available space I NTERNAL R EPRESENTATION . U SER ­ H IDDEN FROM USERS I NTERFACE • Relatively small number of BlockArrays are needed – Exact number not known at compile time (or even at runtime given updates) NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 18

  19. Memory Manager Made up three parts: 1. Vectorized Bit Trees – Helps determine which blocks are empty – Key components for efficient memory reclamation 2. BlockArrays 3. 𝐶 + 𝑈𝑠𝑓𝑓𝑡 of BlockArrays NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 19

  20. Vec ­ Trees • Each block is either full (0) or empty (1). Next available Vect ­ Tree position 0 1 representation 0 0 1 1 0 0 0 0 0 1 0 1 1 2 0 5 2 6 2 5 1 2 2 6 2 5 2 7 1 2 4 1 2 5 1 2 𝐶𝐵 1 , 1 𝐶𝐵 1 , 1 block Vect ­ Tree implementation Vect ­ Tree implementation 0 0 0 0 0 0 1 1 0 1 0 1 Machine word Machine word (b) Partially Empty (a) Full BlockArray BlockArray NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 20

  21. Vec ­ Trees Complexity • Given a BlockArray with 𝐶𝐵 • Storage complexity 𝑃 𝐶𝐵 bits. In practice this is close to 2 ⋅ 𝐶𝐵 bits – Relatively small overhead. • Vec ­ Tree Updates require 𝑃 log 𝐶𝐵 operations NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 21

  22. Memory Manager Made up three parts: 1. Vectorized Bit Trees 2. BlockArrays 3. 𝑪 + 𝑼𝒔𝒇𝒇𝒕 of BlockArrays – Responsible for deciding when more memory needs to be allocated NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 22

  23. 𝑪 + 𝑼𝒔𝒇𝒇𝒕 of BlockArrays • Each block sizes has a different tree. • The KEY of the 𝑪 + 𝑼𝒔𝒇𝒇𝒕 is the number of empty blocks 𝐶 + 𝑈𝑠𝑓𝑓 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 1 edge in a block Array 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 2 edges in a block 0 Log. of block size 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 4 edges in a block 1 B+ Node 2 0 3 4 4 1 4 ... 𝐶𝐵 2 , 2 1 available block 31 NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 23

  24. 𝑪 + 𝑼𝒔𝒇𝒇𝒕 of BlockArrays • Currently supports adjacency lists with up ­ to 2 31 edges • Can easily support up 2 63 edge blocks!!! 𝐶 + 𝑈𝑠𝑓𝑓 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 1 edge in a block Array 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 2 edges in a block 0 Log. of block size 𝐶 + 𝑈𝑠𝑓𝑓 for BlockArray with 4 edges in a block 1 B+ Node B+ Node 2 2 5 1 0 3 4 0 5 3 7 4 1 1 4 1 4 2 7 6 1 ... 𝐶𝐵 2 , 2 1 available block 𝐶𝐵 2 , 1 0 available blocks 31 NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 24

  25. 𝑪 + 𝑼𝒔𝒇𝒇𝒕 Properties • A new BlockArray is allocated when all existing BlockArrays are full. • Great for memory utilization. NVIDIA GTC ­ cuSTINGER ­ Oded Green, 2017 25

Recommend


More recommend