Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices Oded Green
Hornet • A scalable and dynamic data structure for – Sparse data – Graph algorithms – Linear algebra based problems • Formerly known as cuSTINGER – Hornet initialization is hundreds of times faster – Hornet updates are 4X-10X faster – The Hornet data structure offers is more robust and scalable than cuSTINGER. • Essentially a dynamic CSR data structure • Easy to use Oded Green, GTC-18 2
“Separation of powers” • Dynamic graph data structure and dynamic graph algorithms are in two different repositories – Easy to integrate with external library – Can also be used with matrices • This talk focuses on the data structure Oded Green, GTC-18 3
Graph Primitives – Upfront summary • Great performance for static and dynamic graph algorithms • Scalable • Simple to use • Will discuss algorithm framework later today – 1:00pm – Same room as this talk Oded Green, GTC-18 4
Hornet – Upfront Summary • Can support over 150 million updates per second • Can easily scale to graphs with billions of vertices • CSR comparison – Initializing is also relatively in-expensive – usually less than 3X slower – Hornet requires 30% more storage – Identical performance • COO (edge-list) comparison – Hornet requires 20% less storage – Hornet has better locality Oded Green, GTC-18 5
Big Data problems need Graph Analysis Commu mmuni nicat cation ion netwo works ks: • World-wide connectivity • High velocity changes • Different types of extracted data: – Physical communication network. – Person-to-person communication network. Financi ncial al netwo works ks: He Health th-Care Care networks: orks: • Transactions between • Various players. players. • Pattern matching and • Different transactions epidemic monitoring. types (property graph) • Problem sizes have doubled in last 5 years. Oded Green, GTC-18 6
Hornet Properties ✓ A Simple programming model ✓ Enable algorithm designers to implement dynamic & streaming graph algorithms with ease. ✓ Can easily grows 1000X initial size (no restart needed) ✓ Millions of updates per second to graph ✓ Updates are not bottlenecks for analytics. ✓ Automated data management ✓ Transfers data between host and device automatically ✓ Reduces fragmentation ✓ Supports memory reclamation • Scalable data structure cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuSTINGE INGER: Supporti porting ng dynami mic graph h algorithms hms for GPUs Oded Green, GTC-18 7
Definitions • Dynamic graphs – Graph can change over time. – Changes can be to topology, edges, or vertices. • For example new edges between two vertices. – Changes to edge or vertex weights • Streaming graphs: – Graphs changing at high rates. – 100s of thousands of updates per second. • Dynamic matrices – Adding a perturbation to the matrix Oded Green, GTC-18 8
Dynamic graph example • Only a subset of the entire graph… • Dynamic: – At time 𝑢 : • 𝑤 and 𝑥 become friends. • 𝑗𝑜𝑡𝑓𝑠𝑢_𝑓𝑒𝑓 (𝑤, 𝑥) – At time Ƹ 𝑢 : • 𝑣 and 𝑤 no longer friends 𝑥 𝑤 • d𝑓𝑚𝑓𝑢𝑓 𝑓𝑒𝑓 𝑣,𝑤 𝑣 • Additional operations include vertex insertions & deletions Oded Green, GTC-18 9
Widely used graph data structures Na Name mes Pr Pros Cons ons Dense Adjacency • Supports updates • Poor locality Matrix • Massive storage requirements Linked lists • Flexible • Poor locality • Limited parallelism • Allocation time is costly COO (Edge list) - • Has some flexibility • Poor locality unsorted • Updates are simple • Stores both the source and • Lots of parallelism destination CSR • Uses exact amount of • Inflexible memory • Good locality • Lots of parallelism These data structures don’t cut it Oded Green, GTC-18 10
Compressed Sparse Row (CSR) Pros: • Uses precise storage requirements • Great locality – Good for GPUs • Handful of arrays – Simple to use and manage Cons: ns: • Inflexible. Src/Row 0 1 2 3 4 5 6 7 • Network growth Offset 0 2 4 7 9 11 13 14 14 unsupported • Topology changes Dest./Col. 1 2 0 5 0 3 4 2 6 2 5 1 4 3 2 5 2 7 4 1 4 1 2 4 1 7 1 2 unsupported Value • Property graphs not supported Oded Green, GTC-18 11
Hornet – A High Level View U SER -I NTERFACE 0 1 2 3 4 5 6 7 Ver ertex Id Used ed 2 2 3 2 2 2 1 0 Over-allocated space Pointer er 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Dest./Col. 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Value • Every vertex points at its own array • Many edges array (blocks) • Block size is determined by the number of neighbors (always powers of 2) • Extra space left at the end of the block Oded Green, GTC-18 12
Hornet – Property Graph Support U SER -I NTERFACE 0 1 2 3 4 5 6 7 Ver ertex Id Used ed 2 2 3 2 2 2 1 0 Pointer er Dest./Col. 3 1 2 0 5 2 6 2 5 1 4 0 3 4 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Weight Type Time 1 User 1 User 2 …. • Programmers can add fields per edge • Easy to mange for static graph data structures • Hornet manages the data movement Oded Green, GTC-18 13
Hornet in Detail Over-allocated space U SER -I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id for vertex insertions Used ed (#Neigh eighbor bors/ s/nnz) 2 2 3 2 2 2 1 0 Pointer er Over-allocated space for power-of-two rule 3 1 2 0 5 2 6 2 5 1 4 0 3 4 2 5 2 5 7 1 2 4 1 7 1 2 1 4 0 0 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 𝑪𝑩 𝟏,𝟐 bsize = 2 𝑪𝑩 𝟐,𝟐 bsize = 2 𝑪𝑩 𝟐,𝟑 bsize = 4 𝑪𝑩 𝟑,𝟐 bsize= 1 Dest./Col. Vec-Tree Weight Bit status M EMORY MANAGER Oded Green, GTC-18 14
Hornet Performance • Memory Utilization – Independent of the GPU being used • Initialization overhead • Update rate Oded Green, GTC-18 15
Hornet Performance Analysis • All performance analysis is for the P100 – 56 SMs – 3584 SPs – 16GB HBM2 memory Oded Green, GTC-18 16
Inputs Graphs • DIMACS 10 Graph Implementation Challenge • SNAP – Stanford Network Analysis Project • Florida Matrix Collection The following is only a subset of these graphs: Name Type |𝑭| * Source |𝑾| Collaboration DIMACS 𝑑𝑝𝐵𝑣𝑢ℎ𝑝𝑠𝑡𝐸𝐶𝑀𝑄 299𝑙 1.95𝑁 Trace route SNAP 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 1.69𝑁 11.1𝑁 Random DIMACS 𝑙𝑠𝑝𝑜_21 2𝑁 201𝑁 Citation SNAP 𝑑𝑗𝑢 − 𝑞𝑏𝑢𝑓𝑜𝑢𝑡 3.77𝑁 16.5𝑁 Matrix DIMACS 𝑑𝑏𝑓15 5.15𝑁 94𝑁 Webcrawl DIMACS 𝑣𝑙 − 2002 18.52𝑁 523𝑁 Oded Green, GTC-18 17
Memory Utilization - Overall 100% Space Efficiency 80% 60% 40% 20% 0% 2 16 Hornet COO cuSTINGER AIM • BlockArrays of size 2 16 • 70% average utilization of CSR • Better utilization then: COO, cuSTINGER, AIM – AIM allocates all GPU memory Oded Green, GTC-18 18
Initialization overhead 1,000 Slowdown versus CSR 100 10 1 Hornet cuSTINGER • Time to initialize data structure in comparison to CSR • In most cases 2X-3X slower – One time penalty • Much faster than cuSTINGER Oded Green, GTC-18 19
Insertion Rates • Supports over 150M updates per second • Hornet – 4𝑌 − 10𝑌 faster than cuSTINGER – Does not have 𝑞𝑓𝑠𝑔𝑝𝑠𝑛𝑏𝑜𝑑𝑓 𝑒𝑗𝑞 like cuSTINGER • Scalable growth in update rate cuSTIN INGE GER Horne Ho net 10 9 10 9 1,000,000,000 1,000,000,000 Update Rate (edges per second) Update Rate (edges per second) 10 8 10 8 100,000,000 100,000,000 10 7 10,000,000 10 7 10,000,000 10 6 1,000,000 1,000,000 10 6 100,000 100,000 10 5 10 5 10,000 10,000 10 4 10 4 1,000 1,000 10 3 10 3 in-2004 soc-LiveJournal1 cage15 kron_g500-logn21 in-2004 soc-LiveJournal1 cage15 kron_g500-logn21 Oded Green, GTC-18 20
Take away • Anything you can do with CSR you can also do with Hornet (other way is not true) • Supports high update rates • Scalable in both data size and in performance • Simple and high-level programming model – See you at 1:00pm • Also, look for James Fox’s talk on a cool algorithm for finding the maximal K-Truss in a graph – Uses dynamic triangle counting and the Hornet’s deletion… Oded Green, GTC-18 21
Hornet Team (Current & Alumni) Oded Green, GTC-18 22
Thank you • Email: ogreen@gatech.edu • Hornet: – https://github.com/hornet-gt/hornet • HornetsNest: – https://github.com/hornet-gt/hornetsnest Oded Green, GTC-18 23
Backup slides Oded Green, GTC-18 24
Memory Utilization - Overall 100% Space Efficiency 80% 60% 40% 20% 0% 2 16 2 18 2 22 Hornet Hornet Hornet COO cuSTINGER AIM • 70% average utilization of CSR • Better utilization in comparison to: COO, cuSTINGER, AIMS Oded Green, GTC-18 25
Part 2: HornetsNest • Algorithm framework for Hornet data structure – We support CSR as well • All algorithms are implemented using a small set of operations – We show that these operators are efficient for static graph algorithms and can be used for dynamic graph algorithms • Uses features from C++11 and C++14 Oded Green, GTC-18 26
Recommend
More recommend