graph mining
play

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of - PowerPoint PPT Presentation

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph Computation Think like a vertex Linear algebra Graph Search Find instances of path expressions Graph Mining Mine patterns of


  1. Graph Mining Marco Serafini COMPSCI 532 Lecture 11

  2. Classes of Graph Systems • Graph Computation • Think like a vertex • Linear algebra • Graph Search • Find instances of path expressions • Graph Mining • Mine patterns of interest and their matches 3 3

  3. Applications of Graph Mining • Web and Advertising • Link spam detection • Identify sub-markets • Attributed edges in knowledge bases • Biology • DNA motif detection • Protein-protein interaction • Social computing • Friend recommendation • Community detection 4 4

  4. Graph Mining - Concepts 1 2 1 1 4 4 3 4 2 3 3 2 5 6 6 6 6 6 Input graph Pattern Embeddings 5

  5. Graph Exploration • Enumerate (& prune) embeddings • Aggregate by pattern … … … Input graph … … … 6 6

  6. Challenges # unique embedding (log-scale) 1.7B ! ! ! l a i t n 117M e n o p x E 7.8M 335K 22K 4K 1 2 3 4 5 6 Size of embedding • Exponential number of embeddings 7 7

  7. API Example: Clique finding boolean filter (Embedding e) { 1 return isClique (e); 2 } 3 void process (Embedding e) { 4 output (e); 5 } 6 boolean shouldExpand (Embedding embedding) { 7 return embedding.getNumVertices() < maxsize ; 8 } 9 boolean isClique (Embedding e) { 10 return e.getNumEdgesAddedWithExpansion()==e.getNumberOfVertices()-1; 11 } 12 8 8

  8. Model - Think Like an Embedding Exploration step i+1 Exploration step i 1 2 1 2 1 2 … 1 2 1 2 1 3 3 3 3 6 6 6 Input Output Input Output true 1 2 Filter Process 1 2 6 false 1 3 Save 1 2 Discard 3 1. Start from a 2. Candidates : 3. Filter 4. Produce outputs set of initial Expand by 1 uninteresting embeddings vertex/edge candidates 9 9

  9. Avoiding redundant work • Problem: Automorphic embeddings • Automorphisms == subgraph equivalences • Redundant work == 1 2 3 3 2 1 Worker 1 Worker 2 10 10

  10. Avoiding redundant work • Solution: Decentralized Embedding Canonicality • No coordination • Efficient == 1 2 3 3 2 1 Worker 1 Worker 2 isCanonical(e) → true isCanonical(e) → false 11 11

  11. Embedding Canonicality • isCanonical(e) iff at every step add neighbor with smallest ID e Initial embedding (e) 5 6 ● 1 - 3 - 6 Expansions: 1 4 ● 1 - 3 - 6 - 5 → canonical ● 1 - 3 - 6 - 4 → canonical 2 3 ● 1 - 3 - 6 - 2 → not canonical (1 - 2 - 3 - 6) 12 12

  12. Efficient Pattern Aggregation • Goal: Aggregate automorphic patterns to single key • Find canonical pattern • No known polynomial solution 1 2 2 4 3 5 3x Expensive graph canonization Canonical pattern 13

  13. Efficient Pattern Aggregation • Solution: 2-level pattern aggregation 1. Embeddings → quick patterns 2. Quick patterns → canonical pattern 1 2 2 4 3 5 3x Linear matching to quick pattern 1) Quick patterns 2x Expensive graph canonization 2) Canonical pattern 14

  14. Handling Exponential growth • Goal: handle trillions+ different embeddings? • Solution: Overapproximating DAGs (ODAGs) • Compress into less restrictive superset • Deal with spurious embeddings Canonical Embeddings 2 3 1 4 2 2 1 1 4 3 3 3 2 4 1 4 5 4 4 3 2 3 4 5 1 5 2 4 5 3 4 5 ODAG Input Graph Embedding List 15 15

  15. Variants of Graph Mining Systems • G-Miner • For each embedding, decide how to expand • Easier to implement graph search • Systems for random walks • ASAP: Random walks for approximate subgraph enumeration • KnightKing: Random walks for node embeddings and graph neural networks 16 16

Recommend


More recommend