CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra - PowerPoint PPT Presentation

1 CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra Batista

2 PAGERANK ALGORITHM

3 PageRank • Consider the graph at the right – These could be webpages with links shown in the corresponding direction – These could be neighboring cities a d • PageRank generally tries to answer the question: c – If we let a bunch of people randomly "walk" the graph, what is the probability that they end up at a b e certain location (page, city, etc.) in the "steady-state" • We could solve this problem through Monte-Carlo simulation (essentially the CS 103 PA5 or PA1 Coin- flipping or Zombie assignment…depending on semester) – Simulate a large number of random walkers and record where each one ends to build up an answer of the probabilities for each vertex • But there are more efficient ways of doing it

4 PageRank a d • Let us write out the adjacency matrix for this graph c • Now let us make a weighted version by normalizing based on b e the out-degree of each node Source – Ex. If you're at node B we have a 50-50 chance of going to A or E a b c d e • From this you could write a system of linear equations (i.e. a 0 1 0 0 0 what are the chances you end up at vertex I at the next time b 0 0 1 0 0 Target step, given you are at some vertex J now c 1 0 0 1 1 – pA = 0.5*pB d 0 0 0 0 1 – pB = pC e 0 1 0 0 0 – pC = pA + pD + 0.5*pE Adjacency Matrix – pD = 0.5*pE Source=j – pE = 0.5*pB a b c d e – We also know: pA + pB + pC + pD + pE = 1 a 0 0.5 0 0 0 b 0 0 1 0 0 Target=i c 1 0 0 1 0.5 d 0 0 0 0 0.5 e 0 0.5 0 0 0 Weighted Adjacency Matrix [Divide by (a i,j )/degree(j)]

5 PageRank a d • System of Linear Equations c – pA = 0.5*pB – pB = pC b e – pC = pA + pD + 0.5*pE Source=j – pD = 0.5*pE a b c d e – pE = 0.5*pB a 0 0.5 0 0 0 – We also know: pA + pB + pC + pD + pE = 1 b 0 0 1 0 0 • If you know something about linear algebra, you know we Target=i c 1 0 0 1 0.5 can write these equations in matrix form as a linear system d 0 0 0 0 0.5 – Ax = y e 0 0.5 0 0 0 Weighted Adjacency Matrix [Divide by (a i,j )/degree(j)] 0 0.5 0 0 0 pA pA = 0.5PB 0 0.5 0 0 0 pA 0 0 1 0 0 pB pB = pC 0 0 1 0 0 pB * = 1 0 0 1 0.5 pC pC = pA+pD+0.5*pE * 1 0 0 1 0.5 pC 0 0 0 0 0.5 pD pD = 0.5*pE 0 0 0 0 0.5 pD 0 0.5 0 0 0 pE pE = 0.5*pB 0 0.5 0 0 0 pE

6 PageRank • But remember we want the steady state solution – The solution where the probabilities don't change from one step to the next a d • So we want a solution to: A p = p • We can: c – Use a linear system solver (Gaussian elimination) – Or we can just seed the problem with some probabilities and then just b e iterate until the solution settles down Source=j a b c d e 0 0.5 0 0 0 pA pA a 0 0.5 0 0 0 0 0 1 0 0 pB pB b 0 0 1 0 0 * = 1 0 0 1 0.5 pC Target=i pC c 1 0 0 1 0.5 0 0 0 0 0.5 pD pD d 0 0 0 0 0.5 0 0.5 0 0 0 pE pE e 0 0.5 0 0 0 Weighted Adjacency Matrix [Divide by (a i,j )/degree(j)]

7 Iterative PageRank • But remember we want the steady state solution a d – The solution where the probabilities don't change from one step to the next c • So we want a solution to: A p = p b e • We can: – Use a linear system solver (Gaussian elimination) – Or we can just seed the problem with some probabilities and then just iterate until the solution settles down Step 0 Sol. Step 1 Sol. Step 29 Sol. Step 30 Sol. 0 0.5 0 0 0 .2 .1 0 0.5 0 0 0 ? .1507 0 0 1 0 0 .2 .2 0 0 1 0 0 ? .3078 * = 1 0 0 1 0.5 .2 .5 * = 1 0 0 1 0.5 ? .3126 0 0 0 0 0.5 .2 .1 0 0 0 0 0.5 ? .0783 0 0.5 0 0 0 .2 .1 0 0.5 0 0 0 ? .1507 Step 1 Sol. Step 2 Sol. 0 0.5 0 0 0 .1 .1 .1538 Actual PageRank Solution 0 0 1 0 0 .2 .5 from solving linear system: .3077 * = 1 0 0 1 0.5 .5 .25 .3077 0 0 0 0 0.5 .1 .05 .0769 0 0.5 0 0 0 .1 .1 .1538

8 Additional Notes a d • What if we change the graph and now D has no incoming links…what is its PageRank? c – 0 b e • Most PR algorithms add a probability that someone just enters that URL (i.e. enters the graph at that node) – Usually define something called the damping factor, α (often chosen around 0.15) – Probability of randomly starting or jumping somewhere = 1- α • So at each time step the next PR value for node i is given as: 𝛽 Pr(𝑘) – Pr 𝑗 = 𝑂 + (1 − 𝛽) ∗ σ 𝑘∈𝑄𝑠𝑓𝑒(𝑗) 𝑃𝑣𝑢𝐸𝑓𝑕(𝑘) – N is the total number of vertices – Usually run 30 or so update steps – Start each Pr(i) = 1/N

9 In a Web Search Setting • Given some search keywords we could find the pages that have that matching keywords • We often expand that set of pages by including all successors and predecessors of those pages – Include all pages that are within a radius of 1 of the pages that actually have the keyword • Now consider that set of pages and the subgraph that it induces • Run PageRank on that subgraph Expanded Page Hits Induced Subgraph Full WebGraph (Contain keyword) (Preds & Succs) to run PageRank g g g a d a d a d a d c c c c b e b e b e b e f f f

10 Dijkstra's Algorithm SINGLE-SOURCE SHORTEST PATH (SSSP)

11 SSSP • Let us associate a 'weight' with Edge weights each edge 6 – Could be physical distance, cost of h b 5 using the link, etc. c 14 13 a 7 • Find the shortest path from a 2 8 d 4 source node, 'a' to all other nodes g 1 4 3 e (c,13),(e,4) a f List of Vertices (c,5),(h,6) b Adjacency Lists (a,13),(b,5),(d,2),(e,8),(g,7) c (c,2),(f,1) d (a,4),(c,8),(f,3) e (d,1),(e,3),(g,4) f (c,7),(f,4),(h,14) g h (b,6),(g,14)

12 SSSP • What is the shortest distance from 6 h b 5 'a' to all other vertices? c 14 13 a 7 • How would you go about 2 8 d 4 computing those distances? g 1 4 3 e f Vert Dist (c,13),(e,4) a a 0 List of Vertices (c,5),(h,6) b List of Vertices Adjacency Lists b (a,13),(b,5),(d,2),(e,8),(g,7) c c (c,2),(f,1) d d (a,4),(c,8),(f,3) e e (d,1),(e,3),(g,4) f f (c,7),(f,4),(h,14) g g h (b,6),(g,14) h

13 Dijkstra's Algorithm • Dijkstra's algorithm is similar to a 6 h b BFS but pulls out the smallest 5 c 14 13 distance vertex (from the source) a 7 2 8 d rather than pulling vertices out in 4 g 1 FIFO order (as in BFS) 4 3 e • Maintain a data structure that you f Vert Dist can identify shortly a 0 – We'll show it as a table of all vertices List of Vertices b inf with their currently 'known' distance c inf d inf from the source e inf • Initially, a has dist=0 f inf • All others = infinite distance g inf h inf

14 Dijkstra's Algorithm 6 h b 1. SSSP(G, s) 5 c 14 2. PQ = empty PQ 13 a 7 2 3. s.dist = 0; s.pred = NULL 8 d 4 g 4. PQ.insert(s) 1 5. For all v in vertices 4 3 e f 6. if v != s then v.dist = inf; PQ.insert(v) 7. while PQ is not empty Vert Dist 8. v = min(); PQ.remove_min() a 0 List of Vertices b inf 9. for u in neighbors(v) c inf 10. w = weight(v,u) d inf 11. if(v.dist + w < u.dist) e inf 12. u.pred = v f inf 13. u.dist = v.dist + w; g inf h inf 14. PQ.decreaseKey(u, u.dist)

15 Dijkstra's Algorithm 6 h b 1. SSSP(G, s) 5 c 14 2. PQ = empty PQ 13 a 7 2 3. s.dist = 0; s.pred = NULL 8 d 4 g 4. PQ.insert(s) 1 5. For all v in vertices 4 3 e f 6. if v != s then v.dist = inf; PQ.insert(v) 7. while PQ is not empty Vert Dist 8. v = min(); PQ.remove_min() a 0 List of Vertices b inf 9. for u in neighbors(v) v=a c inf 13 10. w = weight(v,u) d inf 11. if(v.dist + w < u.dist) e inf 4 12. u.pred = v f inf 13. u.dist = v.dist + w; g inf h inf 14. PQ.decreaseKey(u, u.dist)

16 Dijkstra's Algorithm 6 h b 1. SSSP(G, s) 5 c 14 2. PQ = empty PQ 13 a 7 2 3. s.dist = 0; s.pred = NULL 8 d 4 g 4. PQ.insert(s) 1 5. For all v in vertices 4 3 e f 6. if v != s then v.dist = inf; PQ.insert(v) 7. while PQ is not empty Vert Dist 8. v = min(); PQ.remove_min() a 0 List of Vertices b inf 9. for u in neighbors(v) v=e c 13 12 10. w = weight(v,u) d inf 11. if(v.dist + w < u.dist) e 4 12. u.pred = v f inf 7 13. u.dist = v.dist + w; g inf h inf 14. PQ.decreaseKey(u, u.dist)

17 Dijkstra's Algorithm 6 h b 1. SSSP(G, s) 5 c 14 2. PQ = empty PQ 13 a 7 2 3. s.dist = 0; s.pred = NULL 8 d 4 g 4. PQ.insert(s) 1 5. For all v in vertices 4 3 e f 6. if v != s then v.dist = inf; PQ.insert(v) 7. while PQ is not empty Vert Dist 8. v = min(); PQ.remove_min() a 0 List of Vertices b inf 9. for u in neighbors(v) v=f c 12 10. w = weight(v,u) d inf 8 11. if(v.dist + w < u.dist) e 4 12. u.pred = v f 7 13. u.dist = v.dist + w; g inf 11 h inf 14. PQ.decreaseKey(u, u.dist)

CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra - PowerPoint PPT Presentation

1 CSCI 104 Graph Algorithms Mark Redekopp David Kempe Sandra Batista 2 PAGERANK ALGORITHM 3 PageRank Consider the graph at the right These could be webpages with links shown in the corresponding direction These could be

104 Clinical Cases In Medicine Presentation And 104 Clinical Cases In Medicine Presentation And

The Sun, Earth and Moon Observable Patterns Return to Table of Contents Slide 5 / 104 Slide

WELCOME TO COM 104 INTRODUCTION TO MULTIMEDIA Instructor: Tom McHugh Introduction to Multimedia

specification Alexey Sorokin Head of Test Equipment Development Department Stackable PC -

Photon Interactions 22.104 Spring 2002 MIT Department of Nuclear Engineering

Designing a Single Cycle Datapath Computer Science 104 Alvin R. Lebeck cps 104 1 Administrivia

Math 104 Calculus 10.1 Sequences Math 104 - Yu

Math 104 Calculus 6.4 Surface Area Math 104 -

Math 104 Calculus 10.2 Infinite Series Math 104 -

Math 104 Calculus 8.5 Par6al Frac6ons Math 104 -

Math 104 Calculus 7.4 Rela5ve Rates of Growth Math

Math 104 Calculus 6.3 Arc Length Math 104

Math 104 Calculus 8.4 Trigonometric Subs=tu=ons Math 104

Math 104 Calculus 8.3 Trigonometric Integrals Math 104

CSCI 2133 Rapid Programming Techniques for Innovation UI Design CSS Grid and Flexbox

CSCI 5582 Artificial Intelligence Lecture 23 Jim Martin CSCI 5582 Fall 2006 Today 11/30

Tropical moist dynamical theory Tropical moist dynamical theory from AIRS and TRMM from AIRS and

Constraint Sa+sfac+on Problems Toolbox so far Uninformed search

4 2 5 6 6 7 1 Overfitting Overfitting 2 Examples Attributes Sometimes, model

Biggest Challenge: Dataflow in Meetup for Android Mike Castleman Meetup New York Android

Day 3: Classification Lucas Leemann Essex Summer School Introduction to Statistical Learning L.

Systems Biology: Mathematics for Biologists Kirsten ten Tusscher, Theoretical Biology, UU Chapter

Testing Consumer Rationality using Perfect Graphs and Oriented Discs Shant Boodaghians and Adrian

MONOLITHS MUST DIE! A VERT.X TALE ON REACTIVE MICROSERVICES by Paulo Lopes / RedHat Principal