james fox collaborators
play

James Fox Collaborators Oded Green, Research Scientist (GT) Euna - PowerPoint PPT Presentation

Fast and Scalable Subgraph Isomorphism using Dynamic Graph Techniques James Fox Collaborators Oded Green, Research Scientist (GT) Euna Kim, PhD student (GT) Federico Busato, PhD student (Universita di Verona) Dr. Nicola


  1. Fast and Scalable Subgraph Isomorphism using Dynamic Graph Techniques James Fox

  2. Collaborators • Oded Green, Research Scientist (GT) • Euna Kim, PhD student (GT) • Federico Busato, PhD student (Universita di Verona) • Dr. Nicola Bombieri (Universita di Verona) • Kartik Lakhotia, PhD student (USC) • Shijie Zhou, PhD student (USC) • Shreyas Singapura, PhD student (USC) • Hanqing Zeng, PhD student (USC) • Dr. Rajgopal Kannan, (USC) • Prof. Viktor Prasanna (USC) • Prof. David Bader (GT) Quickly Finding a Truss in Haystack 2

  3. Outline • K-Truss – Introduction – Sequential Approaches • Our new algorithm – Dynamic Triangle Counting – Hornet: data structure for dynamic graphs • Performance Analysis Quickly Finding a Truss in Haystack 3

  4. K-Truss : for given 𝑙 , the 𝑙 − 𝑢𝑠𝑣𝑡𝑡 is a • Definition: subgraph such that each edge closes at least 𝑙 − 2 triangles, i.e. “ support ” of 𝑙 − 2 • A well-connected subgraph – “Relaxation of k-clique, stricter than k-core” [Cohen; 2008] – Computationally efficient to find • Maximal k-truss: focus of our work Quickly Finding a Truss in Haystack 4

  5. Example 2 2 2 1 2 3 3 3 7 7 1 1 1 2 0 3 6 4 6 1 4 1 5 5 2 2 2 2 1 2 2 3 3 2 3 7 7 1 1 2 1 2 2 3 6 4 6 1 4 K=4 K=3 1 5 Truss 5 Truss Quickly Finding a Truss in Haystack 5

  6. Over 1000x time faster Graph Challenge Innovation Award (HPEC’17) Three main factors • Algorithmic Optimization 1. Uses dynamic graph data structure 2. Novel algorithm for dynamically updating triangle counts • Parallelization • Programming model – vertex centric more efficient than linear algebra Quickly Finding a Truss in Haystack 6

  7. Simple Vertex Centric 𝑙 ← 3 𝑥ℎ𝑗𝑚𝑓 𝐹 ≠ ∅ 𝑠𝑓𝑞𝑓𝑏𝑢 𝑣𝑜𝑢𝑗𝑚 𝑜𝑝 𝑛𝑝𝑠𝑓 𝑑ℎ𝑏𝑜𝑕𝑓𝑡 𝑔𝑝𝑠 e = 𝑣, 𝑤 ∈ 𝐹 𝑗𝑔 𝑏𝑒𝑘 𝑣 ∩ 𝑏𝑒𝑘 𝑤 < 𝑙 − 2 𝑒𝑓𝑚𝑓𝑢𝑓 𝑓 𝑔𝑠𝑝𝑛 𝐹 𝑙 ← 𝑙 + 1 𝑙 ← 𝑙 − 1 Quickly Finding a Truss in Haystack 7

  8. Linear Algebra Formulation • Given k • Bold letters refer to vectors and matrices 𝑺 = 𝑭𝑩 𝒚 = 𝑔𝑗𝑜𝑒 𝑆 == 2 ⋅ 𝟐 < 𝑙 − 2 𝑥ℎ𝑗𝑚𝑓 𝒚 𝑭 𝒚 = 𝑭 𝒚, : 𝑭 = 𝑭 𝒚 𝒅 , : 𝑺 = 𝑭 𝒚 𝒅 , : 𝑩 𝑼 − 𝑒𝑗𝑏𝑕 𝑭 𝒚 𝑭 𝒚 𝑼 𝑺 = 𝑺 − 𝑭 𝑭 𝒚 𝑭 𝒚 𝒚 = 𝑔𝑗𝑜𝑒 𝑆 == 2 ⋅ 𝟐 < 𝑙 − 2 Quickly Finding a Truss in Haystack 8

  9. New Algorithm for finding Maximal Truss 𝑔𝑝𝑠 e = 𝑣, 𝑤 ∈ 𝐹 ü - par paral allel w e ← 𝑏𝑒𝑘 𝑣 ∩ 𝑏𝑒𝑘 𝑤 𝑙 ← 3 𝑥ℎ𝑗𝑚𝑓 𝐹 ≠ ∅ 𝑠𝑓𝑞𝑓𝑏𝑢 𝑣𝑜𝑢𝑗𝑚 𝑜𝑝 𝑛𝑝𝑠𝑓 𝑑ℎ𝑏𝑜𝑕𝑓𝑡 𝑚𝑗𝑡𝑢 ← ∅ 𝑔𝑝𝑠 e = 𝑣, 𝑤 ∈ 𝐹 𝑗𝑔 𝑏𝑒𝑘 𝑣 ∩ 𝑏𝑒𝑘 𝑤 < 𝑙 − 2 ü - par paral allel 𝑏𝑞𝑞𝑓𝑜𝑒 𝑚𝑗𝑡𝑢, 𝑓 𝐻 RST ← CreateGraph(𝑚𝑗𝑡𝑢) ü - par paral allel 𝑠𝑓𝑛𝑝𝑤𝑓𝐹𝑒𝑕𝑓𝑡 𝐻, 𝐻 RST ü - par paral allel 𝑉𝑞𝑒𝑏𝑢𝑓𝑈𝑠𝑗𝑏𝑜𝑕𝑚𝑓𝐷𝑝𝑣𝑜𝑢 𝐻, 𝐻 RST ü - par paral allel 𝑙 ← 𝑙 + 1 𝑙 ← 𝑙 − 1 Quickly Finding a Truss in Haystack 9

  10. 𝐻 RST ← CreateGraph(𝑚𝑗𝑡𝑢) • We will create a graph from all the deleted edges • Adjacencies will be sorted 2 2 2 1 1 2 3 3 3 7 7 1 1 1 1 3 1 1 6 4 6 4 1 1 5 5 𝐻 RST 𝐻 Quickly Finding a Truss in Haystack 10

  11. 𝑉𝑞𝑒𝑏𝑢𝑓𝑈𝑠𝑗𝑏𝑜𝑕𝑚𝑓𝐷𝑝𝑣𝑜𝑢 𝐻, 𝐻 RST • Must update counts of non-removed edges • Don’t want to re-compute globally After deletion (incorrect triangle counts) Updated triangle counts 2 2 2 2 2 2 3 3 3 2 7 7 1 2 1 2 3 2 6 4 6 4 5 5 Quickly Finding a Truss in Haystack 11

  12. Three “types” of triangles affected v 1. One edge removed u w v 2. Two edges removed u w v 3. All three edges removed u w [Makkar; HiPC’17] Quickly Finding a Truss in Haystack 12

  13. One edge removed v • 𝑣, 𝑤 deleted u w • By intersecting the list of 𝑣 with the list of 𝑤 we can find all common neighbors – Decrement support by 1 • For all 𝑓 = 𝑣, 𝑤 ∈ 𝐻 RST – 𝐽𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢(𝑣, 𝐻, 𝑤, 𝐻) Quickly Finding a Truss in Haystack 13

  14. Two edges removed v • 𝑣, 𝑤 and 𝑣, 𝑥 deleted u • Intersecting the adjacencies like w before won’t work. • Instead we will intersect adjacencies from the two graphs: 𝐻 and 𝐻 RST • For all 𝑓 = 𝑣, 𝑤 ∈ 𝐻 RST – 𝐽𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢(𝑣, 𝐻, 𝑤, 𝐻 RST ) • Can handle double-counting Quickly Finding a Truss in Haystack 14

  15. Three edges removed v • 𝑣, 𝑤 , 𝑣, 𝑥 , 𝑥, 𝑤 deleted u w • No need to update supports! Quickly Finding a Truss in Haystack 15

  16. So what else do we need? • We need a dynamic graph data structure • These data structures don’t cut it Na Names De Dense Li Linked COO ( OO (Edge CS CSR/CS /CSC Adjacency Ad li lists li list) Matrix Ma ❌ ❌ ❌ Good ü Locality ❌ ❌ Flexible ü ü Updates Quickly Finding a Truss in Haystack 16

  17. Hornet… U SER -I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id Id 2 2 3 2 2 2 1 0 Used Us Over-allocated space 2 2 4 2 2 2 1 0 BS BSiz ize Pointer Po 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Dest./Col. 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Value • Supports updates • Efficient memory manager – Supports edge insertion\deletion – Memory reclamation and deletion. – Hidden from user – Supports vertex insertion\deletion. • Framework • Good locality – Edge list contiguous Quickly Finding a Truss in Haystack 17

  18. Experimental Setup - CPU Intel Dual Processor • Intel Xeon E5-2695 • 16 cores / per processor (32 in total) – 64 threads with Hyperthreading • 45MB LLC • 1TB of DDR4 Quickly Finding a Truss in Haystack 18

  19. Experimental Setup - GPU Single Pascal 𝑄100 • 56 processors (SMs) • 64 threads / per processors (SPs) • 3584 hardware threads • 16GB of HBM2 – 720 GB/s bandwidth Quickly Finding a Truss in Haystack 19

  20. Inputs Graphs • HPEC Graph Challenge • SNAP – Stanford Network Analysis Project The following is only a subset of these graphs: |𝑾| |𝑭| * Na Name Network T Type 𝑑𝑗𝑢 − 𝐼𝑓𝑞𝑄ℎ Citation 35k 421k 𝑏𝑛𝑏𝑨𝑝𝑜0601 Co-purchasing 400𝑙 2.4𝑁 𝑠𝑝𝑏𝑒𝑂𝑓𝑢 − 𝑄𝐵 Road 1𝑁 1.5𝑁 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 Trace route 1.69𝑁 11.1𝑁 𝑕𝑠𝑏𝑞ℎ500 − 𝑡𝑑𝑏𝑚𝑓21 Random 2.1𝑁 34𝑁 *largest: |E|= 134M Quickly Finding a Truss in Haystack 20

  21. Benchmarks 1. Graph Challenge 1. Julia 2. Python 3. Matlab\Octave 2. Our algorithms tive - uses static triangle counting 1. 1. Ite Iterati ta - uses new algorithm 2. 2. Delta Quickly Finding a Truss in Haystack 21

  22. Finding the Maximal Truss Time out – 8 hours Usually – 200X-500X faster Many times over 2000X faster Sometimes 10,000X faster Quickly Finding a Truss in Haystack 22

  23. Execution time per iteration Quickly Finding a Truss in Haystack 23

  24. Future Work • We still think that we can improve by another 10X… • New triangle counting kernel – Balanced and imbalanced intersections – Improved warp utilization Quickly Finding a Truss in Haystack 24

  25. Summary • New algorithm for finding the maximal K- Truss • Given a static input we use techniques from dynamic graph algorithms • Hundreds to thousands of times faster than the benchmarks • We still think that we can improve by another 10X… Quickly Finding a Truss in Haystack 25

  26. Thank you • Email: jfox43@gatech.edu Quickly Finding a Truss in Haystack 26

  27. Backup Slides Quickly Finding a Truss in Haystack 27

  28. Wang & Chang; 2012 • Modified version of Cohen’s algorithm • Sorts the edges based on their support – In each iteration, edges with a support smaller than 𝑙 − 2 are removed • Inherently sequential (due to update process) • Yet, significantly faster than Cohen’s algorithm • Uses hash maps for intersections Quickly Finding a Truss in Haystack 28

  29. Hornet Data Layout • A scalable and dynamic data structure for graph algorithms and linear algebra based problems • Can support up-to 90 million updates per second • Low overhead in comparison with CSR – Initializing is also relatively in-expensive 20%-200% – Equal performance • Simple to use • Implemented for CUDA, yet portable for other architectures cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuSTIN INGER: S : Supporting d dynamic g graph a algorithms fo for G GPUs Quickly Finding a Truss in Haystack 29

  30. Hornet – Property Graph Support U SER -I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id Id 2 2 3 2 2 2 1 0 Used Us 2 2 4 2 2 2 1 0 BSiz BS ize Po Pointer Dest./Col. 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Weight 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Type Time 1 User 1 User 2 …. • These are optional fields Quickly Finding a Truss in Haystack 30

Recommend


More recommend