FrogWild! Ioannis Mitliagkas Michael Borokhovich Alex Dimakis Fast PageRank Approximations Constantine Caramanis on Graph Engines
Web Ranking Given web graph Find “important” pages E B A D D C 2
Web Ranking Given web graph Find “important” pages E Rank Based on In-degree Classic Approach B A D C 2
Web Ranking Given web graph Find “important” pages E Rank Based on In-degree Classic Approach B A A D D S S Susceptible C S S to manipulation by spammer networks 2
PageRank [Page et al., 1999] Page Importance π Described by distribution E B A D C 3
PageRank [Page et al., 1999] Page Importance π Described by distribution E Recursive Definition Important pages are pointed to by D B A ❖ important pages are pointed to by ❖ important pages are pointed to by… π C 3
PageRank [Page et al., 1999] Page Importance π Described by distribution E Recursive Definition Important pages are pointed to by D B A ❖ important pages are pointed to by ❖ important pages are pointed to by… π Robust C to manipulation by spammer networks 3
PageRank - Continuous Interpretation Start: Gallon of water distributed evenly E B A D C 4
PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors B A D C 4
PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors B A D C 4
PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors Redistribute evenly B A D a fraction, p T = 0.15, of all water C 4
PageRank - Continuous Interpretation Start: Gallon of water distributed evenly Every Iteration E Each vertex spreads water evenly to successors Redistribute evenly B A D a fraction, p T = 0.15, of all water Repeat until convergence π C Power Iteration employed usually 4
Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E D B A C 5
Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E 1 D B A C 5
Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E 1/3 D B A 1/3 1/3 C 5
Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B A 1 C 5
Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B A C 5
Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B Sampling after t steps A Frog location gives sample from π C 5
Discrete Interpretation Frog walks randomly on graph Next vertex chosen uniformly at random E Teleportation Every step: teleport w.p. p T D B Sampling after t steps A Frog location gives sample from π π PageRank Vector C Many frogs, estimate vector π 5
PageRank Approximation Looking for k “heavy nodes” Do not need full PageRank vector E E Random Walk Sampling Favors heavy nodes D D B B A A Captured Mass Metric C C For node set S: (S) π 6
PageRank Approximation Looking for k “heavy nodes” Do not need full PageRank vector E Random Walk Sampling Favors heavy nodes D B A Captured Mass Metric k=2 C For node set S: (S) π Return set {E,D} Captured mass = ({E,D}) π 6
Platform
Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction B A D C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8
Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction 1. Gather B A D C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8
Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction 1. Gather B A D 2. Apply C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8
Graph Engines ❖ Engine splits graph across cluster ❖ Vertex program describes logic E GAS abstraction 1. Gather B A D 2. Apply 3. Scatter C Other approaches: Giraph [Avery, 2011], Galois [Nguyen et al., 2013], GraphX [Xin et al., 2013] 8
Edge Cuts ❖ Assign vertices to machines E E ❖ Cross-machine edges require network communication ❖ Pregel, GraphLab 1.0 B B A A D D ❖ High-degree nodes generate large volume of traffic C C ❖ Computational load imbalance 9
Edge Cuts Machine 2 Machine 1 E ❖ Assign vertices to machines B A ❖ Cross-machine edges require network communication D ❖ Pregel, GraphLab 1.0 ❖ High-degree nodes generate large volume of traffic C ❖ Computational load imbalance Machine 3 9
Vertex Cuts ❖ Assign edges to machines ❖ High-degree nodes replicated E ❖ One replica designated master ❖ Need for synchronization 1. Gather B B B A D D 2. Apply [on master] 3. Synchronize mirrors 4. Scatter C ❖ GraphLab 2.0 - PowerGraph ❖ Balanced - Network still bottleneck 10
Vertex Cuts Machine 1 Machine 2 ❖ Assign edges to machines E ❖ High-degree nodes replicated B A ❖ One replica designated master ❖ Need for synchronization B D 1. Gather 2. Apply [on master] B 3. Synchronize mirrors D 4. Scatter ❖ GraphLab 2.0 - PowerGraph C ❖ Balanced - Network still bottleneck Machine 3 10
Vertex Cuts Machine 1 Machine 2 ❖ Assign edges to machines E ❖ High-degree nodes replicated B A ❖ One replica designated master ❖ Need for synchronization B D 1. Gather 2. Apply [on master] B 3. Synchronize mirrors D 4. Scatter ❖ GraphLab 2.0 - PowerGraph C ❖ Balanced - Network still bottleneck Machine 3 10
Random Walks on GraphLab Machine 2 Master node decides step B C Decision synced to all mirrors Machine 1 Machine 3 B A B Only machine M needs it D Unnecessary network traffic Machine M Average replication factor ~8 B Z 11
Random Walks on GraphLab Machine 2 Master node decides step B C Decision synced to all mirrors Machine 1 Z Machine 3 B A B Only machine M needs it D Unnecessary network traffic Machine M Average replication factor ~8 B Z 11
Random Walks on GraphLab Machine 2 Master node decides step Z B C Decision synced to all mirrors Machine 1 Machine 3 Z B A B Only machine M needs it D Unnecessary network traffic Machine M Z Average replication factor ~8 B Z 11
Objective Faster PageRank approximation on GraphLab Idea Only synchronize the mirror that will receive the frog Doable, but requires 1. Serious engine hacking 2. Exposing an ugly/complicated API to programmer Simpler Pick mirrors to synchronize at random! Synchronize independently with probability p S 12
FrogWild! N Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program Ber( p S ) Machine 3 1. Each frog dies w.p. (gives sample) p T Assume K frogs survive B A B D Ber( p S ) 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among Ber( p S ) synchronized mirrors. Machine M B Z 13
FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program Ber( p S ) K Machine 3 1. Each frog dies w.p. (gives sample) p T Assume K frogs survive B A B D Ber( p S ) 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among Ber( p S ) synchronized mirrors. Machine M B Z 13
FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program K Machine 3 1. Each frog dies w.p. (gives sample) p T Assume K frogs survive B A B D Ber( p S ) 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among Ber( p S ) synchronized mirrors. Machine M B Z 13
FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program K Machine 3 1. Each frog dies w.p. (gives sample) p T Assume K frogs survive B A B D 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among Ber( p S ) synchronized mirrors. Machine M B Z 13
FrogWild! Machine 2 Release N frogs in parallel B C Machine 1 Vertex Program K Machine 3 1. Each frog dies w.p. (gives sample) p T Assume K frogs survive B A B D 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among synchronized mirrors. Machine M B Z 13
FrogWild! Machine 2 Release N frogs in parallel K/2 B C Machine 1 Vertex Program Machine 3 1. Each frog dies w.p. (gives sample) p T Assume K frogs survive B A B D 2. For every mirror, draw bridge w.p. p S 3. Spread frogs evenly among synchronized mirrors. Machine M K/2 B Z 13
Recommend
More recommend