http cs224w stanford edu how to organize navigate it how
play

http://cs224w.stanford.edu How to organize/navigate it? How to - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it? First try: y Web directories


  1. CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2.  How to organize/navigate it?  How to organize/navigate it?  First try: y Web directories  Yahoo, ,  DMOZ,  LookSmart LookSmart 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

  3.  SEARCH!  SEARCH!  Find relevant docs in a small and trusted set:  Newspaper articles  Patents, etc. Patents, etc.  Two traditional problems:  Synonimy: buy – purchase, sick – ill  Polysemi: jaguar 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

  4. D Does more documents mean better results? d t b tt lt ? 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

  5.  What is “best” answer to query “Stanford”? What is best answer to query Stanford ?  Anchor Text: I go to Stanford where I study  What about query “newspaper”?  What about query newspaper ?  No single right answer  Scarcity (IR) vs abundance (Web) of information  Scarcity (IR) vs. abundance (Web) of information  Web: Many sources of information. Who to “trust”  Trick:  Trick:  Pages that actually know about newspapers might all be pointing to many newspapers might all be pointing to many newspapers  Ranking! 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

  6.  Goal (back to the newspaper example): Goal (back to the newspaper example):  Don’t just find newspapers.Find “experts” – people who link in a coordinated way to good newspapers  Idea: Links as votes  Idea: Links as votes  Page is more important if it has more links  In ‐ coming links? Out ‐ going links? NYT: 10  Hubs and Authorities Ebay: 3  Quality as an expert (hub): Q y p ( )  Total sum of votes of pages pointed to Yahoo: 3  Quality as an content (authority): CNN: 8  Total sum of votes of experts  Total sum of votes of experts  Principle of repeated improvement WSJ: 9 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

  7. 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

  8. 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

  9. 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

  10. [Kleinberg ‘98]  Each page i has 2 kinds of scores: Each page i has 2 kinds of scores:  Hub score: h i  Authority score : a i y i  HITS algorithm:  Initialize: a i =h i =1 i i  Then keep iterating:  h   a h  Authority: h i j i  i j   h a  Hub: i j  i j  Normalize:  a i =1,  h i =1 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

  11. [Kleinberg ‘98]  HITS converges to a single stable point  HITS converges to a single stable point  Slightly change the notation:  Vector a=(a  Vector a=(a 1 …,a n ), h=(h 1 …,h n ) a ) h=(h h )  Adjacency matrix ( n x n ): M ij =1 if i  j  Then:  Then:      h a h M a i j i ij j   i i j j j j h  Ma  So: a   T a M M h h  And likewise:  And likewise: 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

  12.  Algorithm in new notation:  Algorithm in new notation:  Set: a = h = 1 n  Repeat: Repeat:  h=Ma, a=M T h  Normalize  Then: a=M T (Ma) T a is being updated (in 2 steps): new h M T (Ma)=(M T M)a ( ) ( ) new a new a h is updated (in 2 steps):  Thus, in 2k steps: M (M T h)=(MM T )h a=(M T M) k a ( ) h=(MM T ) k h Repeated matrix powering 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

  13.  Definition:  Definition:  Let Ax=  x for some scalar  , vector x and matrix A  Then x is an eigenvector, and  is its eigenvalue d  i it  Th i i t i l  Fact:  If A is symmetric ( A ij =A ji ) (in our case M T M and MM T are symmetric) ( y )  Then A has n orthogonal unit eigenvectors w 1 …w n that form a basis (coordinate system) with eigenvalues  1 ...  n (|  i |  |  i+1 |) 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

  14.  Write x in coordinate system w 1  Write x in coordinate system w 1 …w n w x=  i  i w i  x has coordinates (  1 ,…,  n ) x has coordinates (  1 ,…,  n )  Suppose:  1 ...  n (|  1 |  |  2 |  …  |  n |)  k  ) =   k  w  A k x = (  k  (  1  1 ,  2  2 ,….,  n  n )   i  i w i  k   A x  As k  , if we normalize A k x   1  1 w 1 A x   1  1 w 1 (all other coordinates  0)  So authority a is eigenvector of M T M associated with largest eigenvalue  1  l t i l 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

  15. The web in 1839  A vote from an important  A vote from an important y/2 page is worth more y y  A page is important if it is  A page is important if it is pointed to by other a/2 y/2 important pages important pages m a m  Define a “rank” r j for node j a/2 r should be proportional to: r j should be proportional to: Flow equations: r   y = y /2 + a /2  r i j j a = y /2 + m /2 outdegree of i  j i m = a /2 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

  16.  Stochastic adjacency matrix M Stochastic adjacency matrix M  Let page j has d j out ‐ links  If j → i , then M ij = 1/ d j else M ij = 0 ij j ij  M is a column stochastic matrix  Columns sum to 1  Rank vector r : vector with 1 entry per page R k i h 1  r i is the importance score of page i  |r| = 1  |r| = 1  The flow equations can be written r = Mr 11/29/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 16

  17.  Imagine a random web surfer:  Imagine a random web surfer:  At any time t , surfer is on some page u  At ti  At time t+1 , the surfer follows an out ‐ link t+1 th f f ll t li k from u uniformly at random  Ends up on some page v linked from u  Ends up on some page v linked from u  Process repeats indefinitely  Let:  Let:  p (t) … vector whose i th coordinate is the prob. that the surfer is at page i at time t prob. that the surfer is at page i at time t  p (t) is a probability distribution over pages 11/29/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 17

  18.  Where is the surfer at time t+1 ?  Where is the surfer at time t+1 ?  Follows a link uniformly at random p (t+1) = Mp (t) p (t+1) = Mp (t)  Suppose the random walk reaches a state p (t+1) = Mp (t) = p (t) (t+1) M (t) (t)  then p (t) is stationary distribution of a random walk  Our rank vector r satisfies r = Mr O k i fi M  So it is a stationary distribution for the random surfer f 11/29/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 18

  19.  Power Iteration:  Power Iteration:  Set r i =1 y a m  r j =  i r i /d i  y y /d y ½ ½ 0  And iterate a ½ 0 1 a m m 0 ½ 0  Example: y 1 1 1 1 5/4 5/4 9/8 9/8 6/5 6/5 a = 1 3/2 1 11/8 … 6/5 m 1 ½ ¾ ½ 3/5 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

  20.  Some pages are “dead ends”  Some pages are dead ends (have no out ‐ links)  Such pages cause importance  Such pages cause importance to leak out  Spider traps (all out links are within the group) within the group)  Eventually spider traps absorb all importance 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

  21.  Power Iteration:  Power Iteration: y a m  Set r i =1 y y ½ ½ 0  r j =  i r i /d i  /d a ½ 0 0  And iterate a m 0 ½ 0 m  Example: y 1 1 1 1 ¾ ¾ 5/8 5/8 0 0 a = 1 ½ ½ 3/8 … 0 m 1 ½ ¼ ¼ 0 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

  22. y y a m  Power Iteration:  Power Iteration: y y y ½ ½ 0  Set r i =1 a ½ 0 0 a  r j =  i r i /d i  /d m m m 0 0 ½ ½ 1 1  And iterate  Example: y 1 1 1 1 ¾ ¾ 5/8 5/8 0 0 a = 1 ½ ½ 3/8 … 0 m 1 3/2 7/4 2 3 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

Recommend


More recommend