pagerank ranking of nodes in graphs
play

PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE - PowerPoint PPT Presentation

PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ October 15, 2019 Introduction to Random


  1. PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ October 15, 2019 Introduction to Random Processes Ranking of nodes in graphs 1

  2. PageRank: Random walk Ranking of nodes in graphs: Random walk Ranking of nodes in graphs: Probability propagation Introduction to Random Processes Ranking of nodes in graphs 2

  3. Graphs 5 4 1 6 2 3 ◮ Graph ⇒ A set of V of vertices or nodes j = 1 , . . . , J ⇒ Connected by a set of edges E defined as ordered pairs ( i , j ) ◮ In figure ⇒ Nodes are V = { 1 , 2 , 3 , 4 , 5 , 6 } ⇒ Edges E = { (1 , 2) , (1 , 5) , (2 , 3) , (2 , 5) , (3 , 4) , ... (3 , 6) , (4 , 5) , (4 , 6) , (5 , 4) } ◮ Ex. 1: Websites and hyperlinks ⇒ World Wide Web (WWW) ◮ Ex. 2: People and friendship ⇒ Social network Introduction to Random Processes Ranking of nodes in graphs 3

  4. How well connected nodes are? 5 4 1 6 2 3 ◮ Q: Which node is the most connected? A: Define most connected ⇒ Can define “most connected” in different ways ◮ Two important connectivity indicators 1) How many links point to a node (outgoing links irrelevant) 2) How important are the links that point to a node ◮ Node rankings to measure website relevance, social influence Introduction to Random Processes Ranking of nodes in graphs 4

  5. Connectivity ranking ◮ Key insight: There is information in the structure of the network ◮ Knowledge is distributed through the network ⇒ The network (not the nodes) knows the rankings � to rank webpages ◮ Idea exploited by Google’s PageRank c ... by social scientists to study trust & reputation in social networks ... by ISI to rank scientific papers, transactions & magazines ... ◮ No one points to 1 ◮ Only 1 points to 2 5 4 ◮ Only 2 points to 3, but 2 more important than 1 1 6 ◮ 4 as high as 5 with less links ◮ Links to 5 have lower rank 2 3 ◮ Same for 6 Introduction to Random Processes Ranking of nodes in graphs 5

  6. Preliminary definitions ◮ Graph G = ( V , E ) ⇒ vertices V = { 1 , 2 , . . . , J } and edges E 5 4 1 6 2 3 ◮ Outgoing neighborhood of i is the set of nodes j to which i points n ( i ) := { j : ( i , j ) ∈ E } ◮ Incoming neighborhood, n − 1 ( i ) is the set of nodes that point to i : n − 1 ( i ) := { j : ( j , i ) ∈ E } ◮ Strongly connected G ⇒ directed path joining any pair of nodes Introduction to Random Processes Ranking of nodes in graphs 6

  7. Definition of rank ◮ Agent A chooses node i , e.g., web page, at random for initial visit ◮ Next visit randomly chosen between links in the neighborhood n ( i ) ⇒ All neighbors chosen with equal probability ◮ If reach a dead end because node i has no neighbors ⇒ Chose next visit at random equiprobably among all nodes ◮ Redefine graph G = ( V , E ) adding edges from dead ends to all nodes ⇒ Restrict attention to connected (modified) graphs 5 4 1 6 2 3 ◮ Rank of node i is the average number of visits of agent A to i Introduction to Random Processes Ranking of nodes in graphs 7

  8. Equiprobable random walk ◮ Formally, let A n be the node visited at time n ◮ Define transition probability P ij from node i into node j � � A n = i � � P ij := P A n +1 = j ◮ Next visit equiprobable among i ’s N i := | n ( i ) | neighbors | n ( i ) | = 1 1 P ij = , for all j ∈ n ( i ) N i 1/5 1/2 to 1 1/2 1/2 ◮ Still have a graph 1/5 5 4 to 2 ◮ But also a MC 1 1/5 1/2 1/2 1 6 to 3 ◮ Red (not blue) circles 1/5 to 4 2 3 1/2 1/2 to 5 1/2 1/5 Introduction to Random Processes Ranking of nodes in graphs 8

  9. Formal definition of rank ◮ Def: Rank r i of i -th node is the time average of number of visits n 1 � r i := lim I { A m = i } n n →∞ m =1 ⇒ Define vector of ranks r := [ r 1 , r 2 , . . . , r J ] T ◮ Rank r i can be approximated by average r ni at time n n r ni := 1 � I { A m = i } n m =1 ⇒ Since n →∞ r ni = r i , it holds r ni ≈ r i for n sufficiently large lim ⇒ Define vector of approximate ranks r n := [ r n 1 , r n 2 , . . . , r nJ ] T ◮ If modified graph is connected, rank independent of initial visit Introduction to Random Processes Ranking of nodes in graphs 9

  10. Ranking algorithm Output : Vector r ( i ) with ranking of node i Input : Scalar n indicating maximum number of iterations Input : Vector N ( i ) containing number of neighbors of i Input : Matrix N ( i , j ) containing indices j of neighbors of i m = 1; r =zeros(J,1); % Initialize time and ranks A 0 = random(‘unid’, J ); % Draw first visit uniformly at random while m < n do jump = random(‘unid’, N A m − 1 ); % Neighbor uniformly at random A m = N ( A m − 1 , jump); % Jump to selected neighbor r ( A m ) = r ( A m ) + 1; % Update ranking for A m m = m + 1; end r = r / n ; % Normalize by number of iterations n Introduction to Random Processes Ranking of nodes in graphs 10

  11. Social graph example ◮ Asked probability students about homework collaboration ◮ Created (crude) graph of the social network of students in the class ⇒ Used ranking algorithm to understand connectedness ◮ Ex: I want to know how well students are coping with the class ⇒ Best to ask people with higher connectivity ranking ◮ 2009 data from “UPenn’s ECE440” Introduction to Random Processes Ranking of nodes in graphs 11

  12. Ranked class graph Pallavi Yerramilli Harish Venkatesan Jacci Jeffries Xiang-Li Lim Thomas Cassel Owen Tian Daniela Savoia Eric Lamb Ceren Dumaz Priya Takiar Sugyan Lohiaa Lindsey Eatough Ankit Aggarwal Madhur Agarwal Lisa Zheng Anthony Dutcher Aarti Kochhar Robert Feigenberg Carolina Lee Saksham Karwal Ciara Kennedy Amanda Smith Amanda Zwarenstein Ranga Ramachandran Michael Harker Katie Joo Shahid Bosan Varun Balan Ivan Levcovitz Pia Ramchandani Jesse Beyroutey Rebecca Gittler Jane Kim Paul Deren Aditya Kaji Jihyoung Ahn Ella Kim Alexandra Malikova Charles Jeon Chris Setian Introduction to Random Processes Ranking of nodes in graphs 12

  13. Convergence metrics ◮ Recall r is vector of ranks and r n of rank iterates ◮ By definition n →∞ r n = r . How fast r n converges to r ( r given)? lim ◮ Can measure by ℓ 2 distance between r and r n J � 1 / 2 � � ( r ni − r i ) 2 ζ n := � r − r n � 2 = i =1 ◮ If interest is only on highest ranked nodes, e.g., a web search ⇒ Denote r ( i ) as the index of the i -th highest ranked node ⇒ Let r ( i ) be the index of the i -th highest ranked node at time n n ◮ First element wrongly ranked at time n i { r ( i ) � = r ( i ) ξ n := arg min n } Introduction to Random Processes Ranking of nodes in graphs 13

  14. Evaluation of convergence metrics Distance 1 10 correctly ranked nodes 0 10 ◮ Distance close to 10 − 2 in ≈ 5 × 10 3 iterations − 1 10 ◮ Bad: Two highest ranks − 2 10 in ≈ 4 × 10 3 iterations 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 time (n) First element wrongly ranked ◮ Awful: Six best ranks in 14 ≈ 8 × 10 3 iterations 12 10 correctly ranked nodes ◮ (Very) slow convergence 8 6 4 2 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 time (n) Introduction to Random Processes Ranking of nodes in graphs 14

  15. When does this algorithm converge? ◮ Cannot confidently claim convergence until 10 5 iterations ⇒ Beyond particular case, slow convergence inherent to algorithm 40 35 30 correctly ranked nodes 25 20 15 10 5 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 time (n) 5 x 10 ◮ Example has 40 nodes, want to use in network with 10 9 nodes! ⇒ Leverage properties of MCs to obtain a faster algorithm Introduction to Random Processes Ranking of nodes in graphs 15

  16. PageRank: Probability propagation Ranking of nodes in graphs: Random walk Ranking of nodes in graphs: Probability propagation Introduction to Random Processes Ranking of nodes in graphs 16

  17. Limit probabilities n 1 � ◮ Recall definition of rank ⇒ r i := lim I { A m = i } n n →∞ m =1 ◮ Rank is time average of number of state visits in a MC ⇒ Can be as well obtained from limiting probabilities ◮ Recall transition probabilities ⇒ P ij = 1 , for all j ∈ n ( i ) N i ◮ Stationary distribution π = [ π 1 , π 1 , . . . , π J ] T solution of π j � � π i = P ji π j = for all i N j j ∈ n − 1 ( i ) j ∈ n − 1 ( i ) ⇒ Plus normalization equation � J i =1 π i = 1 ◮ As per ergodicity of MC (strongly connected G ) ⇒ r = π Introduction to Random Processes Ranking of nodes in graphs 17

  18. Matrix notation, eigenvalue problem ◮ As always, can define matrix P with elements P ij J � � π i = P ji π j = P ji π j for all i j ∈ n − 1 ( i ) j =1 ◮ Right hand side is just definition of a matrix product leading to π = P T π , π T 1 = 1 ⇒ Also added normalization equation ◮ Idea: solve system of linear equations or eigenvalue problem on P T ⇒ Requires matrix P available at a central location ⇒ Computationally costly (sparse matrix P with 10 18 entries) Introduction to Random Processes Ranking of nodes in graphs 18

Recommend


More recommend