csci 104
play

CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra - PowerPoint PPT Presentation

1 CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra Batista 2 PAGERANK ALGORITHM 3 PageRank Consider the graph at the right These could be webpages with links shown in the corresponding direction These could be


  1. 1 CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra Batista

  2. 2 PAGERANK ALGORITHM

  3. 3 PageRank • Consider the graph at the right – These could be webpages with links shown in the corresponding direction – These could be neighboring cities a d • PageRank generally tries to answer the question: c – If we let a bunch of people randomly "walk" the graph, what is the probability that they end up at a b e certain location (page, city, etc.) in the "steady-state" • We could solve this problem through Monte-Carlo simulation (essentially the CS 103 PA5 or PA1 Coin- flipping or Zombie assignment…depending on semester) – Simulate a large number of random walkers and record where each one ends to build up an answer of the probabilities for each vertex • But there are more efficient ways of doing it

  4. 4 PageRank a d • Let us write out the adjacency matrix for this graph c • Now let us make a weighted version by normalizing based on b e the out-degree of each node Source – Ex. If you're at node B we have a 50-50 chance of going to A or E a b c d e • From this you could write a system of linear equations (i.e. a 0 1 0 0 0 what are the chances you end up at vertex I at the next time b 0 0 1 0 0 Target step, given you are at some vertex J now c 1 0 0 1 1 – pA = 0.5*pB d 0 0 0 0 1 – pB = pC e 0 1 0 0 0 – pC = pA + pD + 0.5*pE Adjacency Matrix – pD = 0.5*pE Source=j – pE = 0.5*pB a b c d e – We also know: pA + pB + pC + pD + pE = 1 a 0 0.5 0 0 0 b 0 0 1 0 0 Target=i c 1 0 0 1 0.5 d 0 0 0 0 0.5 e 0 0.5 0 0 0 Weighted Adjacency Matrix [Divide by (a i,j )/degree(j)]

  5. 5 PageRank a d • System of Linear Equations c – pA = 0.5*pB – pB = pC b e – pC = pA + pD + 0.5*pE Source=j – pD = 0.5*pE a b c d e – pE = 0.5*pB a 0 0.5 0 0 0 – We also know: pA + pB + pC + pD + pE = 1 b 0 0 1 0 0 • If you know something about linear algebra, you know we Target=i c 1 0 0 1 0.5 can write these equations in matrix form as a linear system d 0 0 0 0 0.5 – Ax = y e 0 0.5 0 0 0 Weighted Adjacency Matrix [Divide by (a i,j )/degree(j)] 0 0.5 0 0 0 pA pA = 0.5PB 0 0.5 0 0 0 pA 0 0 1 0 0 pB pB = pC 0 0 1 0 0 pB * = 1 0 0 1 0.5 pC pC = pA+pD+0.5*pE * 1 0 0 1 0.5 pC 0 0 0 0 0.5 pD pD = 0.5*pE 0 0 0 0 0.5 pD 0 0.5 0 0 0 pE pE = 0.5*pB 0 0.5 0 0 0 pE

  6. 6 PageRank • But remember we want the steady state solution – The solution where the probabilities don't change from one step to the next a d • So we want a solution to: A p = p • We can: c – Use a linear system solver (Gaussian elimination) – Or we can just seed the problem with some probabilities and then just b e iterate until the solution settles down Source=j a b c d e 0 0.5 0 0 0 pA pA a 0 0.5 0 0 0 0 0 1 0 0 pB pB b 0 0 1 0 0 * = 1 0 0 1 0.5 pC Target=i pC c 1 0 0 1 0.5 0 0 0 0 0.5 pD pD d 0 0 0 0 0.5 0 0.5 0 0 0 pE pE e 0 0.5 0 0 0 Weighted Adjacency Matrix [Divide by (a i,j )/degree(j)]

  7. 7 Iterative PageRank • But remember we want the steady state solution a d – The solution where the probabilities don't change from one step to the next c • So we want a solution to: A p = p b e • We can: – Use a linear system solver (Gaussian elimination) – Or we can just seed the problem with some probabilities and then just iterate until the solution settles down Step 0 Sol. Step 1 Sol. Step 29 Sol. Step 30 Sol. 0 0.5 0 0 0 .2 .1 0 0.5 0 0 0 ? .1507 0 0 1 0 0 .2 .2 0 0 1 0 0 ? .3078 * = 1 0 0 1 0.5 .2 .5 * = 1 0 0 1 0.5 ? .3126 0 0 0 0 0.5 .2 .1 0 0 0 0 0.5 ? .0783 0 0.5 0 0 0 .2 .1 0 0.5 0 0 0 ? .1507 Step 1 Sol. Step 2 Sol. 0 0.5 0 0 0 .1 .1 .1538 Actual PageRank Solution 0 0 1 0 0 .2 .5 from solving linear system: .3077 * = 1 0 0 1 0.5 .5 .25 .3077 0 0 0 0 0.5 .1 .05 .0769 0 0.5 0 0 0 .1 .1 .1538

  8. 8 Additional Notes a d • What if we change the graph and now D has no incoming links…what is its PageRank? c – 0 b e • Most PR algorithms add a probability that someone just enters that URL (i.e. enters the graph at that node) – Usually define something called the damping factor, α (often chosen around 0.15) – Probability of randomly starting or jumping somewhere = 1- α • So at each time step the next PR value for node i is given as: 𝛽 Pr(𝑘) – Pr 𝑗 = 𝑂 + (1 − 𝛽) ∗ σ 𝑘∈𝑄𝑠𝑓𝑒(𝑗) 𝑃𝑣𝑢𝐸𝑓𝑕(𝑘) – N is the total number of vertices – Usually run 30 or so update steps – Start each Pr(i) = 1/N

  9. 9 In a Web Search Setting • Given some search keywords we could find the pages that have that matching keywords • We often expand that set of pages by including all successors and predecessors of those pages – Include all pages that are within a radius of 1 of the pages that actually have the keyword • Now consider that set of pages and the subgraph that it induces • Run PageRank on that subgraph Expanded Page Hits Induced Subgraph Full WebGraph (Contain keyword) (Preds & Succs) to run PageRank g g g a d a d a d a d c c c c b e b e b e b e f f f

  10. 10 Dijkstra's Algorithm SINGLE-SOURCE SHORTEST PATH (SSSP)

  11. 11 SSSP • Let us associate a 'weight' with Edge weights each edge 6 – Could be physical distance, cost of h b 5 using the link, etc. c 14 13 a 7 • Find the shortest path from a 2 8 d 4 source node, 'a' to all other nodes g 1 4 3 e (c,13),(e,4) a f List of Vertices (c,5),(h,6) b Adjacency Lists (a,13),(b,5),(d,2),(e,8),(g,7) c (c,2),(f,1) d (a,4),(c,8),(f,3) e (d,1),(e,3),(g,4) f (c,7),(f,4),(h,14) g h (b,6),(g,14)

  12. 12 SSSP • What is the shortest distance from 6 h b 5 'a' to all other vertices? c 14 13 a 7 • How would you go about 2 8 d 4 computing those distances? g 1 4 3 e f Vert Dist (c,13),(e,4) a a 0 List of Vertices (c,5),(h,6) b List of Vertices Adjacency Lists b (a,13),(b,5),(d,2),(e,8),(g,7) c c (c,2),(f,1) d d (a,4),(c,8),(f,3) e e (d,1),(e,3),(g,4) f f (c,7),(f,4),(h,14) g g h (b,6),(g,14) h

  13. 13 Dijkstra's Algorithm • Dijkstra's algorithm is similar to a 6 h b BFS but pulls out the smallest 5 c 14 13 distance vertex (from the source) a 7 2 8 d rather than pulling vertices out in 4 g 1 FIFO order (as in BFS) 4 3 e • Maintain a data structure that you f Vert Dist can identify shortly a 0 – We'll show it as a table of all vertices List of Vertices b inf with their currently 'known' distance c inf d inf from the source e inf • Initially, a has dist=0 f inf • All others = infinite distance g inf h inf

  14. 14 Dijkstra's Algorithm 6 h b 1. SSSP(G, s) 5 c 14 2. PQ = empty PQ 13 a 7 2 3. s.dist = 0; s.pred = NULL 8 d 4 g 4. PQ.insert(s) 1 5. For all v in vertices 4 3 e f 6. if v != s then v.dist = inf; PQ.insert(v) 7. while PQ is not empty Vert Dist 8. v = min(); PQ.remove_min() a 0 List of Vertices b inf 9. for u in neighbors(v) c inf 10. w = weight(v,u) d inf 11. if(v.dist + w < u.dist) e inf 12. u.pred = v f inf 13. u.dist = v.dist + w; g inf h inf 14. PQ.decreaseKey(u, u.dist)

  15. 15 Dijkstra's Algorithm 6 h b 1. SSSP(G, s) 5 c 14 2. PQ = empty PQ 13 a 7 2 3. s.dist = 0; s.pred = NULL 8 d 4 g 4. PQ.insert(s) 1 5. For all v in vertices 4 3 e f 6. if v != s then v.dist = inf; PQ.insert(v) 7. while PQ is not empty Vert Dist 8. v = min(); PQ.remove_min() a 0 List of Vertices b inf 9. for u in neighbors(v) v=a c inf 13 10. w = weight(v,u) d inf 11. if(v.dist + w < u.dist) e inf 4 12. u.pred = v f inf 13. u.dist = v.dist + w; g inf h inf 14. PQ.decreaseKey(u, u.dist)

  16. 16 Dijkstra's Algorithm 6 h b 1. SSSP(G, s) 5 c 14 2. PQ = empty PQ 13 a 7 2 3. s.dist = 0; s.pred = NULL 8 d 4 g 4. PQ.insert(s) 1 5. For all v in vertices 4 3 e f 6. if v != s then v.dist = inf; PQ.insert(v) 7. while PQ is not empty Vert Dist 8. v = min(); PQ.remove_min() a 0 List of Vertices b inf 9. for u in neighbors(v) v=e c 13 12 10. w = weight(v,u) d inf 11. if(v.dist + w < u.dist) e 4 12. u.pred = v f inf 7 13. u.dist = v.dist + w; g inf h inf 14. PQ.decreaseKey(u, u.dist)

  17. 17 Dijkstra's Algorithm 6 h b 1. SSSP(G, s) 5 c 14 2. PQ = empty PQ 13 a 7 2 3. s.dist = 0; s.pred = NULL 8 d 4 g 4. PQ.insert(s) 1 5. For all v in vertices 4 3 e f 6. if v != s then v.dist = inf; PQ.insert(v) 7. while PQ is not empty Vert Dist 8. v = min(); PQ.remove_min() a 0 List of Vertices b inf 9. for u in neighbors(v) v=f c 12 10. w = weight(v,u) d inf 8 11. if(v.dist + w < u.dist) e 4 12. u.pred = v f 7 13. u.dist = v.dist + w; g inf 11 h inf 14. PQ.decreaseKey(u, u.dist)

Recommend


More recommend