a sparse dynamic graph and matrix data layout oded green
play

A Sparse Dynamic Graph and Matrix Data Layout Oded Green Going to - PowerPoint PPT Presentation

A Sparse Dynamic Graph and Matrix Data Layout Oded Green Going to talk about 2 things Hornet A scalable and dynamic data structure for graph algorithms and linear algebra based problems Formerly known as cuSTINGER HornetsNest


  1. A Sparse Dynamic Graph and Matrix Data Layout Oded Green

  2. Going to talk about 2 things • Hornet – A scalable and dynamic data structure for graph algorithms and linear algebra based problems – Formerly known as cuSTINGER • HornetsNest – A framework for static and dynamic analytics Oded Green, GTC-DC-17 2

  3. Hornet – Upfront Summary • Can support over 250 million updates per second • Low overhead in comparison with CSR – Initializing is also relatively in-expensive – usually less than 3X slower – Equal performance • Currently implemented for CUDA – We are porting Hornet to the CPU • Really easy to use Oded Green, GTC-DC-17 3

  4. Graph Primitives – Upfront summary • Great performance for static and dynamic graph algorithms • Scalable • Simple to use Oded Green, GTC-DC-17 4

  5. Big Data problems need Graph Analysis Commu mmuni nicat cation ion netwo works ks: • World-wide connectivity • High velocity changes • Different types of extracted data: – Physical communication network. – Person-to-person communication network. Financi ncial al netwo works ks: Health th-Care Care networ orks: ks: • Transactions between • Various players. players. • Pattern matching and • Different transactions epidemic monitoring. types (property graph) • Problem sizes have doubled in last 5 years. Graphs are a unifying motif for data analytics. Oded Green, GTC-DC-17 5

  6. STINGER • STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation • Enable algorithm designers to implement dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks not ideal for the GPU • Good performance for all types of graph problems and algorithms - static and dynamic. • Assumes globally addressable memory access Oded Green, GTC-DC-17 6

  7. STINGER and cuSTINGER Properties  A Simple programming model  Millions of updates per second to graph  Updates are not bottlenecks for analytics.  Advanced memory manager  Transfers data between host and device automatically  Reduces initialization time  Allows for simple update processes STINGER Papers: [Bader et al. ; 2007; Tech Report], [Ediger et al. ; HPEC; 2012], [McColl et al. ; PPAA; 2014] cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuS uSTIN TINGER GER: : Sup uppor orti ting ng dynamic namic graph h algorit rithm hms for GPUs Oded Green, GTC-DC-17 7

  8. cuSTINGER is now HORNET Oded Green, GTC-DC-17 8

  9. Definitions • Dynamic graphs (matrices) – Graph can change over time. – Changes can be to topology, edges, or vertices. • For example new edges between two vertices. – Changes to edge or vertex weights • Streaming graphs: – Graphs changing at high rates. – 100s of thousands of updates per second. Oded Green, GTC-DC-17 9

  10. Dynamic graph example • Only a subset of the entire graph… • Dynamic: – At time 𝑢 : • 𝑤 and 𝑥 become friends. • 𝑗𝑜𝑡𝑓𝑠𝑢_𝑓𝑒𝑕𝑓 (𝑤, 𝑥) – At time Ƹ 𝑢 : • 𝑣 and 𝑤 no longer friends 𝑥 𝑤 • d𝑓𝑚𝑓𝑢𝑓 𝑓𝑒𝑕𝑓 𝑣,𝑤 𝑣 • Additional operations include vertex insertions & deletions Oded Green, GTC-DC-17 10

  11. “Separation of powers” • Dynamic graph data structure and dynamic graph algorithms are in two different repositories – Easy to integrate with external library – Can also be used with matrices • First part of today’s talk will be on the dynamic data structure Oded Green, GTC-DC-17 11

  12. Part 1 – Data Structure cu cuST STINGER INGER Ver ersi sion on 2. 2.0 • Improved initialization time – 100x of time faster than Version 1.0 • New memory manager – Reduces fragmentation – Enables memory reclamation – Offers good memory bounds • Scalable data structure – Can easily grow 1000X its initial size without needing to be re- initialized • Faster updates Oded Green, GTC-DC-17 12

  13. So what else do we need? • We need a dynamic graph data structure • These data structures don’t cut it Na Name mes Pr Pros Cons ons Dense Adjacency Matrix • Flexible • Limited utilization for sparse data Linked lists • Flexible • Poor locality • Allocation time is costly COO (Edge list) - unsorted • Has some flexibility • Poor locality • Updates are simple • Stores both the source and destination CSR • Uses exact amount of • Inflexible memory • Good locality Oded Green, GTC-DC-17 13

  14. Compressed Sparse Row (CSR) Pros: • Uses precise storage requirements • Great locality – Good for GPUs • Handful of arrays – Simple to use and manage Cons: ns: • Inflexible. Src/Row 0 1 2 3 4 5 6 7 • Network growth Offset 0 2 4 7 9 11 13 14 14 unsupported • Topology changes Dest./Col. 1 2 0 5 0 3 4 2 6 2 5 1 4 3 2 5 2 7 4 1 4 1 2 4 1 7 1 2 unsupported Value • Property graphs not supported Oded Green, GTC-DC-17 14

  15. Hornet – A High Level View U SER -I NTERFACE 0 1 2 3 4 5 6 7 Ver ertex Id Used ed 2 2 3 2 2 2 1 0 Over-allocated space Pointer er 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Dest./Col. 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Value • Supports updates – Supports edge insertion\deletion and deletion. – Supports vertex insertion\deletion. Oded Green, GTC-DC-17 15

  16. Hornet – Property Graph Support U SER -I NTERFACE 0 1 2 3 4 5 6 7 Ver ertex Id Used ed 2 2 3 2 2 2 1 0 Pointer er Dest./Col. 3 1 2 0 5 2 6 2 5 1 4 0 3 4 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Weight Type Time 1 User 1 User 2 …. Oded Green, GTC-DC-17 16

  17. Hornet in Detail Over-allocated space U SER -I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id for vertex insertions Used ed (#Neigh eighbor bors/ s/nnz) 2 2 3 2 2 2 1 0 Pointer er Over-allocated space for power-of-two rule 3 1 2 0 5 2 6 2 5 1 4 0 3 4 2 5 2 5 7 1 2 4 1 7 1 2 1 4 0 0 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 𝑪𝑩 𝟏,𝟐 bsize = 2 𝑪𝑩 𝟐,𝟐 bsize = 2 𝑪𝑩 𝟐,𝟑 bsize = 4 𝑪𝑩 𝟑,𝟐 bsize= 1 Dest./Col. Vec-Tree Weight Bit status M EMORY MANAGER Oded Green, GTC-DC-17 17

  18. Hornet Performance Analysis • Memory Utilization • Initialization Overhead • Update rate – Number of sustainable updates per second Oded Green, GTC-DC-17 18

  19. Experimental Setup GPU 𝝂 Arch SMs SPs SPs Memor ory Memor ory (GB) B) Type K40 Kepler 15 2880 12 GDDR5 P100 Pascal 56 3584 16 HBM2 • Unless noted otherwise, all performance analysis is for the P100 Oded Green, GTC-DC-17 19

  20. Inputs Graphs • DIMACS 10 Graph Implementation Challenge • SNAP – Stanford Network Analysis Project • Florida Matrix Collection The following is only a subset of these graphs: Name Type |𝑭| * Source |𝑾| Collaboration DIMACS 𝑑𝑝𝐵𝑣𝑢ℎ𝑝𝑠𝑡𝐸𝐶𝑀𝑄 299𝑙 1.95𝑁 Trace route SNAP 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 1.69𝑁 11.1𝑁 Random DIMACS 𝑙𝑠𝑝𝑜_21 2𝑁 201𝑁 Citation SNAP 𝑑𝑗𝑢 − 𝑞𝑏𝑢𝑓𝑜𝑢𝑡 3.77𝑁 16.5𝑁 Matrix DIMACS 𝑑𝑏𝑕𝑓15 5.15𝑁 94𝑁 Webcrawl DIMACS 𝑣𝑙 − 2002 18.52𝑁 523𝑁 Oded Green, GTC-DC-17 20

  21. Memory Utilization - Overall 100% Space Efficiency 80% 60% 40% 20% 0% Hornet COO cuSTINGER • 70% average utilization of CSR • Better utilization in comparison to: COO, cuSTINGER, AIMS Oded Green, GTC-DC-17 21

  22. Insertion Rates • Supports over 250M updates per second • Hornet – 4𝑌 − 10𝑌 faster than cuSTINGER – Does not have 𝑞𝑓𝑠𝑔𝑝𝑠𝑛𝑏𝑜𝑑𝑓 𝑒𝑗𝑞 like cuSTINGER • Scalable growth in update rate cuSTIN INGE GER Horne Ho net 10 9 10 9 1,000,000,000 1,000,000,000 Update Rate (edges per second) Update Rate (edges per second) 10 8 10 8 100,000,000 100,000,000 10 7 10,000,000 10 7 10,000,000 10 6 1,000,000 1,000,000 10 6 100,000 100,000 10 5 10 5 10,000 10,000 10 4 10 4 1,000 1,000 10 3 10 3 in-2004 soc-LiveJournal1 cage15 kron_g500-logn21 in-2004 soc-LiveJournal1 cage15 kron_g500-logn21 Oded Green, GTC-DC-17 22

  23. Part 2: HornetsNest • Algorithm framework for Hornet data structure – We support CSR as well • All algorithms are implemented using a small set of operations – We show that these operators are efficient for static graph algorithms and can be used for dynamic graph algorithms • Uses features from C++11 and C++14 Oded Green, GTC-DC-17 23

  24. Algorithmic Graph Primitives • All algorithms are implemented through this API • Simple primitives – 𝐺𝑝𝑠𝐵𝑚𝑚𝑊𝑓𝑠𝑢𝑗𝑑𝑓𝑡𝐽𝑜𝐻 𝐻, 𝑔 𝑤 ∈ 𝑊 , 𝑡𝑢𝑠𝑣𝑑𝑢 – 𝐺𝑝𝑠𝐵𝑚𝑚𝐹𝑒𝑕𝑓𝑡𝐽𝑜𝐻 𝐻, 𝑔 𝑡𝑠𝑑 ∈ 𝑊, 𝑒𝑓𝑡𝑢 ∈ 𝑊 , 𝑡𝑢𝑠𝑣𝑑𝑢 – 𝐽𝑜𝑡𝑓𝑠𝑢𝐹𝑒𝑕𝑓𝑡 𝐻, 𝐹 𝑜𝑓𝑥 – 𝑆𝑓𝑛𝑝𝑤𝑓𝐹𝑒𝑕𝑓𝑡 𝐻, 𝐹 𝑠𝑓𝑛 – 𝐽𝑜𝑡𝑓𝑠𝑢𝑊𝑓𝑠𝑢𝑗𝑑𝑓𝑡 𝐻, 𝑊 𝑜𝑓𝑥 – 𝑆𝑓𝑛𝑝𝑤𝑓𝑊𝑓𝑠𝑢𝑗𝑑𝑓𝑡 𝐻, 𝑊 𝑠𝑓𝑛 Oded Green, GTC-DC-17 24

Recommend


More recommend