A Sparse Dynamic Graph and Matrix Data Layout Oded Green
Going to talk about 2 things • Hornet – A scalable and dynamic data structure for graph algorithms and linear algebra based problems – Formerly known as cuSTINGER • HornetsNest – A framework for static and dynamic analytics Oded Green, GTC-DC-17 2
Hornet – Upfront Summary • Can support over 250 million updates per second • Low overhead in comparison with CSR – Initializing is also relatively in-expensive – usually less than 3X slower – Equal performance • Currently implemented for CUDA – We are porting Hornet to the CPU • Really easy to use Oded Green, GTC-DC-17 3
Graph Primitives – Upfront summary • Great performance for static and dynamic graph algorithms • Scalable • Simple to use Oded Green, GTC-DC-17 4
Big Data problems need Graph Analysis Commu mmuni nicat cation ion netwo works ks: • World-wide connectivity • High velocity changes • Different types of extracted data: – Physical communication network. – Person-to-person communication network. Financi ncial al netwo works ks: Health th-Care Care networ orks: ks: • Transactions between • Various players. players. • Pattern matching and • Different transactions epidemic monitoring. types (property graph) • Problem sizes have doubled in last 5 years. Graphs are a unifying motif for data analytics. Oded Green, GTC-DC-17 5
STINGER • STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation • Enable algorithm designers to implement dynamic & streaming graph algorithms with ease. • Portable semantics for various platforms – Linked list of edge blocks not ideal for the GPU • Good performance for all types of graph problems and algorithms - static and dynamic. • Assumes globally addressable memory access Oded Green, GTC-DC-17 6
STINGER and cuSTINGER Properties A Simple programming model Millions of updates per second to graph Updates are not bottlenecks for analytics. Advanced memory manager Transfers data between host and device automatically Reduces initialization time Allows for simple update processes STINGER Papers: [Bader et al. ; 2007; Tech Report], [Ediger et al. ; HPEC; 2012], [McColl et al. ; PPAA; 2014] cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuS uSTIN TINGER GER: : Sup uppor orti ting ng dynamic namic graph h algorit rithm hms for GPUs Oded Green, GTC-DC-17 7
cuSTINGER is now HORNET Oded Green, GTC-DC-17 8
Definitions • Dynamic graphs (matrices) – Graph can change over time. – Changes can be to topology, edges, or vertices. • For example new edges between two vertices. – Changes to edge or vertex weights • Streaming graphs: – Graphs changing at high rates. – 100s of thousands of updates per second. Oded Green, GTC-DC-17 9
Dynamic graph example • Only a subset of the entire graph… • Dynamic: – At time 𝑢 : • 𝑤 and 𝑥 become friends. • 𝑗𝑜𝑡𝑓𝑠𝑢_𝑓𝑒𝑓 (𝑤, 𝑥) – At time Ƹ 𝑢 : • 𝑣 and 𝑤 no longer friends 𝑥 𝑤 • d𝑓𝑚𝑓𝑢𝑓 𝑓𝑒𝑓 𝑣,𝑤 𝑣 • Additional operations include vertex insertions & deletions Oded Green, GTC-DC-17 10
“Separation of powers” • Dynamic graph data structure and dynamic graph algorithms are in two different repositories – Easy to integrate with external library – Can also be used with matrices • First part of today’s talk will be on the dynamic data structure Oded Green, GTC-DC-17 11
Part 1 – Data Structure cu cuST STINGER INGER Ver ersi sion on 2. 2.0 • Improved initialization time – 100x of time faster than Version 1.0 • New memory manager – Reduces fragmentation – Enables memory reclamation – Offers good memory bounds • Scalable data structure – Can easily grow 1000X its initial size without needing to be re- initialized • Faster updates Oded Green, GTC-DC-17 12
So what else do we need? • We need a dynamic graph data structure • These data structures don’t cut it Na Name mes Pr Pros Cons ons Dense Adjacency Matrix • Flexible • Limited utilization for sparse data Linked lists • Flexible • Poor locality • Allocation time is costly COO (Edge list) - unsorted • Has some flexibility • Poor locality • Updates are simple • Stores both the source and destination CSR • Uses exact amount of • Inflexible memory • Good locality Oded Green, GTC-DC-17 13
Compressed Sparse Row (CSR) Pros: • Uses precise storage requirements • Great locality – Good for GPUs • Handful of arrays – Simple to use and manage Cons: ns: • Inflexible. Src/Row 0 1 2 3 4 5 6 7 • Network growth Offset 0 2 4 7 9 11 13 14 14 unsupported • Topology changes Dest./Col. 1 2 0 5 0 3 4 2 6 2 5 1 4 3 2 5 2 7 4 1 4 1 2 4 1 7 1 2 unsupported Value • Property graphs not supported Oded Green, GTC-DC-17 14
Hornet – A High Level View U SER -I NTERFACE 0 1 2 3 4 5 6 7 Ver ertex Id Used ed 2 2 3 2 2 2 1 0 Over-allocated space Pointer er 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Dest./Col. 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Value • Supports updates – Supports edge insertion\deletion and deletion. – Supports vertex insertion\deletion. Oded Green, GTC-DC-17 15
Hornet – Property Graph Support U SER -I NTERFACE 0 1 2 3 4 5 6 7 Ver ertex Id Used ed 2 2 3 2 2 2 1 0 Pointer er Dest./Col. 3 1 2 0 5 2 6 2 5 1 4 0 3 4 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Weight Type Time 1 User 1 User 2 …. Oded Green, GTC-DC-17 16
Hornet in Detail Over-allocated space U SER -I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id for vertex insertions Used ed (#Neigh eighbor bors/ s/nnz) 2 2 3 2 2 2 1 0 Pointer er Over-allocated space for power-of-two rule 3 1 2 0 5 2 6 2 5 1 4 0 3 4 2 5 2 5 7 1 2 4 1 7 1 2 1 4 0 0 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 𝑪𝑩 𝟏,𝟐 bsize = 2 𝑪𝑩 𝟐,𝟐 bsize = 2 𝑪𝑩 𝟐,𝟑 bsize = 4 𝑪𝑩 𝟑,𝟐 bsize= 1 Dest./Col. Vec-Tree Weight Bit status M EMORY MANAGER Oded Green, GTC-DC-17 17
Hornet Performance Analysis • Memory Utilization • Initialization Overhead • Update rate – Number of sustainable updates per second Oded Green, GTC-DC-17 18
Experimental Setup GPU 𝝂 Arch SMs SPs SPs Memor ory Memor ory (GB) B) Type K40 Kepler 15 2880 12 GDDR5 P100 Pascal 56 3584 16 HBM2 • Unless noted otherwise, all performance analysis is for the P100 Oded Green, GTC-DC-17 19
Inputs Graphs • DIMACS 10 Graph Implementation Challenge • SNAP – Stanford Network Analysis Project • Florida Matrix Collection The following is only a subset of these graphs: Name Type |𝑭| * Source |𝑾| Collaboration DIMACS 𝑑𝑝𝐵𝑣𝑢ℎ𝑝𝑠𝑡𝐸𝐶𝑀𝑄 299𝑙 1.95𝑁 Trace route SNAP 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 1.69𝑁 11.1𝑁 Random DIMACS 𝑙𝑠𝑝𝑜_21 2𝑁 201𝑁 Citation SNAP 𝑑𝑗𝑢 − 𝑞𝑏𝑢𝑓𝑜𝑢𝑡 3.77𝑁 16.5𝑁 Matrix DIMACS 𝑑𝑏𝑓15 5.15𝑁 94𝑁 Webcrawl DIMACS 𝑣𝑙 − 2002 18.52𝑁 523𝑁 Oded Green, GTC-DC-17 20
Memory Utilization - Overall 100% Space Efficiency 80% 60% 40% 20% 0% Hornet COO cuSTINGER • 70% average utilization of CSR • Better utilization in comparison to: COO, cuSTINGER, AIMS Oded Green, GTC-DC-17 21
Insertion Rates • Supports over 250M updates per second • Hornet – 4𝑌 − 10𝑌 faster than cuSTINGER – Does not have 𝑞𝑓𝑠𝑔𝑝𝑠𝑛𝑏𝑜𝑑𝑓 𝑒𝑗𝑞 like cuSTINGER • Scalable growth in update rate cuSTIN INGE GER Horne Ho net 10 9 10 9 1,000,000,000 1,000,000,000 Update Rate (edges per second) Update Rate (edges per second) 10 8 10 8 100,000,000 100,000,000 10 7 10,000,000 10 7 10,000,000 10 6 1,000,000 1,000,000 10 6 100,000 100,000 10 5 10 5 10,000 10,000 10 4 10 4 1,000 1,000 10 3 10 3 in-2004 soc-LiveJournal1 cage15 kron_g500-logn21 in-2004 soc-LiveJournal1 cage15 kron_g500-logn21 Oded Green, GTC-DC-17 22
Part 2: HornetsNest • Algorithm framework for Hornet data structure – We support CSR as well • All algorithms are implemented using a small set of operations – We show that these operators are efficient for static graph algorithms and can be used for dynamic graph algorithms • Uses features from C++11 and C++14 Oded Green, GTC-DC-17 23
Algorithmic Graph Primitives • All algorithms are implemented through this API • Simple primitives – 𝐺𝑝𝑠𝐵𝑚𝑚𝑊𝑓𝑠𝑢𝑗𝑑𝑓𝑡𝐽𝑜𝐻 𝐻, 𝑔 𝑤 ∈ 𝑊 , 𝑡𝑢𝑠𝑣𝑑𝑢 – 𝐺𝑝𝑠𝐵𝑚𝑚𝐹𝑒𝑓𝑡𝐽𝑜𝐻 𝐻, 𝑔 𝑡𝑠𝑑 ∈ 𝑊, 𝑒𝑓𝑡𝑢 ∈ 𝑊 , 𝑡𝑢𝑠𝑣𝑑𝑢 – 𝐽𝑜𝑡𝑓𝑠𝑢𝐹𝑒𝑓𝑡 𝐻, 𝐹 𝑜𝑓𝑥 – 𝑆𝑓𝑛𝑝𝑤𝑓𝐹𝑒𝑓𝑡 𝐻, 𝐹 𝑠𝑓𝑛 – 𝐽𝑜𝑡𝑓𝑠𝑢𝑊𝑓𝑠𝑢𝑗𝑑𝑓𝑡 𝐻, 𝑊 𝑜𝑓𝑥 – 𝑆𝑓𝑛𝑝𝑤𝑓𝑊𝑓𝑠𝑢𝑗𝑑𝑓𝑡 𝐻, 𝑊 𝑠𝑓𝑛 Oded Green, GTC-DC-17 24
Recommend
More recommend