frogwild
play

FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis - PowerPoint PPT Presentation

FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis Fast PageRank Approximations Constantine Caramanis on Graph Engines Web Ranking Given web graph Find important pages E B A D D C 2 Web Ranking Given web graph


  1. FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis Fast PageRank Approximations Constantine Caramanis on Graph Engines

  2. Web Ranking Given web graph Find “important” pages E B A D D C 2

  3. Web Ranking Given web graph Find “important” pages E Rank Based on In-degree Classic Approach B A D C 2

  4. Web Ranking Given web graph Find “important” pages E Rank Based on In-degree Classic Approach B A A D D S S Susceptible C S S to manipulation by spammer networks 2

  5. PageRank [Page et al., 1999] Page Importance π Described by distribution E B A D C 3

  6. PageRank [Page et al., 1999] Page Importance π Described by distribution E Recursive Definition Important pages are pointed to by D B A ❖ important pages are pointed to by ❖ important pages are pointed to by… π C 3

  7. PageRank [Page et al., 1999] Page Importance π Described by distribution E Recursive Definition Important pages are pointed to by D B A ❖ important pages are pointed to by ❖ important pages are pointed to by… π Robust C to manipulation by spammer networks 3

  8. PageRank - Continuous Interpretation Start: Gallon of water distributed evenly E B A D C 4

  9. PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors B A D C 4

  10. PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors B A D C 4

  11. PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors Redistribute evenly B A D a fraction, p T = 0.15, of all water C 4

  12. PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors Redistribute evenly B A D a fraction, p T = 0.15, of all water Repeat until convergence π C Power Iteration employed usually 4

  13. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E D B A C 5

  14. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E 1 D B A C 5

  15. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E 1/3 D B A 1/3 1/3 C 5

  16. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B A 1 C 5

  17. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B A C 5

  18. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B Sampling after t steps A Frog location gives sample from π C 5

  19. Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B Sampling after t steps A Frog location gives sample from π π PageRank Vector C Many frogs, estimate vector π 5

  20. PageRank Approximation Looking for k “heavy nodes” Do not need full PageRank vector E E Random Walk Sampling Favors heavy nodes D D B B A A Captured Mass Metric C C For node set S: (S) π 6

  21. PageRank Approximation Looking for k “heavy nodes” Do not need full PageRank vector E Random Walk Sampling Favors heavy nodes D B A Captured Mass Metric k=2 C For node set S: (S) π Return set {E,D} Captured mass = ({E,D}) π 6

  22. Platform

  23. Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction B A D C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8

  24. Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction 1. Gather B A D C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8

  25. Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction 1. Gather B A D 2. Apply C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8

  26. Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction 1. Gather B A D 2. Apply 3. Scatter C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8

  27. Edge Cuts ❖ Assign vertices to machines E E ❖ Cross-machine edges require network communication ❖ Pregel, GraphLab 1.0 B B A A D D ❖ High-degree nodes generate large volume of traffic C C ❖ Computational load imbalance 9

  28. Edge Cuts Machine 2 Machine 1 E ❖ Assign vertices to machines B A ❖ Cross-machine edges require network communication D ❖ Pregel, GraphLab 1.0 ❖ High-degree nodes generate large volume of traffic C ❖ Computational load imbalance Machine 3 9

  29. Vertex Cuts ❖ Assign edges to machines ❖ High-degree nodes replicated E ❖ One replica designated master ❖ Need for synchronization 1. Gather B B B A D D 2. Apply [on master] 3. Synchronize mirrors 4. Scatter C ❖ GraphLab 2.0 - PowerGraph ❖ Balanced - Network still bottleneck 10

  30. Vertex Cuts Machine 1 Machine 2 ❖ Assign edges to machines E ❖ High-degree nodes replicated B A ❖ One replica designated master ❖ Need for synchronization B D 1. Gather 2. Apply [on master] B 3. Synchronize mirrors D 4. Scatter ❖ GraphLab 2.0 - PowerGraph C ❖ Balanced - Network still bottleneck Machine 3 10

  31. Vertex Cuts Machine 1 Machine 2 ❖ Assign edges to machines E ❖ High-degree nodes replicated B A ❖ One replica designated master ❖ Need for synchronization B D 1. Gather 2. Apply [on master] B 3. Synchronize mirrors D 4. Scatter ❖ GraphLab 2.0 - PowerGraph C ❖ Balanced - Network still bottleneck Machine 3 10

  32. Random Walks on GraphLab Machine 2 Master node decides step B C Decision synced to all mirrors Machine 1 Machine 3 B A B Only machine M needs it D Unnecessary network traffic Machine M Average replication factor ~8 B Z 11

  33. Random Walks on GraphLab Machine 2 Master node decides step B C Decision synced to all mirrors Machine 1 Z Machine 3 B A B Only machine M needs it D Unnecessary network traffic Machine M Average replication factor ~8 B Z 11

  34. Random Walks on GraphLab Machine 2 Master node decides step Z B C Decision synced to all mirrors Machine 1 Machine 3 Z B A B Only machine M needs it D Unnecessary network traffic Machine M Z Average replication factor ~8 B Z 11

  35. Objective Faster PageRank approximation on GraphLab Idea Only synchronize the mirror that will receive the frog Doable, but requires 1. Serious engine hacking 2. Exposing an ugly/complicated API to programmer Simpler Pick mirrors to synchronize at random! Synchronize independently with probability p S 12

  36. FrogWild! N Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program Ber( p S ) Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D Ber( p S ) 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 Ber( p S ) synchronized mirrors. Machine M B Z 13

  37. FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program Ber( p S ) K Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D Ber( p S ) 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 Ber( p S ) synchronized mirrors. Machine M B Z 13

  38. FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program K Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D Ber( p S ) 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 Ber( p S ) synchronized mirrors. Machine M B Z 13

  39. FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program K Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 Ber( p S ) synchronized mirrors. Machine M B Z 13

  40. FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program K Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 synchronized mirrors. Machine M B Z 13

  41. FrogWild! Machine 2 Release N frogs in parallel K/2 B C Machine 1 Vertex Program Machine 3 1. Each frog dies w.p. (gives sample) 
 p T Assume K frogs survive B A B D 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among 
 synchronized mirrors. Machine M K/2 B Z 13

Recommend


More recommend