http://cs224w.stanford.edu How to organize/navigate it? How to - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu

 How to organize/navigate it?  How to organize/navigate it?  First try: y Web directories  Yahoo, ,  DMOZ,  LookSmart LookSmart 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

 SEARCH!  SEARCH!  Find relevant docs in a small and trusted set:  Newspaper articles  Patents, etc. Patents, etc.  Two traditional problems:  Synonimy: buy – purchase, sick – ill  Polysemi: jaguar 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

D Does more documents mean better results? d t b tt lt ? 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

 What is “best” answer to query “Stanford”? What is best answer to query Stanford ?  Anchor Text: I go to Stanford where I study  What about query “newspaper”?  What about query newspaper ?  No single right answer  Scarcity (IR) vs abundance (Web) of information  Scarcity (IR) vs. abundance (Web) of information  Web: Many sources of information. Who to “trust”  Trick:  Trick:  Pages that actually know about newspapers might all be pointing to many newspapers might all be pointing to many newspapers  Ranking! 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

 Goal (back to the newspaper example): Goal (back to the newspaper example):  Don’t just find newspapers.Find “experts” – people who link in a coordinated way to good newspapers  Idea: Links as votes  Idea: Links as votes  Page is more important if it has more links  In ‐ coming links? Out ‐ going links? NYT: 10  Hubs and Authorities Ebay: 3  Quality as an expert (hub): Q y p ( )  Total sum of votes of pages pointed to Yahoo: 3  Quality as an content (authority): CNN: 8  Total sum of votes of experts  Total sum of votes of experts  Principle of repeated improvement WSJ: 9 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

[Kleinberg ‘98]  Each page i has 2 kinds of scores: Each page i has 2 kinds of scores:  Hub score: h i  Authority score : a i y i  HITS algorithm:  Initialize: a i =h i =1 i i  Then keep iterating:  h   a h  Authority: h i j i  i j   h a  Hub: i j  i j  Normalize:  a i =1,  h i =1 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

[Kleinberg ‘98]  HITS converges to a single stable point  HITS converges to a single stable point  Slightly change the notation:  Vector a=(a  Vector a=(a 1 …,a n ), h=(h 1 …,h n ) a ) h=(h h )  Adjacency matrix ( n x n ): M ij =1 if i  j  Then:  Then:      h a h M a i j i ij j   i i j j j j h  Ma  So: a   T a M M h h  And likewise:  And likewise: 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

 Algorithm in new notation:  Algorithm in new notation:  Set: a = h = 1 n  Repeat: Repeat:  h=Ma, a=M T h  Normalize  Then: a=M T (Ma) T a is being updated (in 2 steps): new h M T (Ma)=(M T M)a ( ) ( ) new a new a h is updated (in 2 steps):  Thus, in 2k steps: M (M T h)=(MM T )h a=(M T M) k a ( ) h=(MM T ) k h Repeated matrix powering 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

 Definition:  Definition:  Let Ax=  x for some scalar  , vector x and matrix A  Then x is an eigenvector, and  is its eigenvalue d  i it  Th i i t i l  Fact:  If A is symmetric ( A ij =A ji ) (in our case M T M and MM T are symmetric) ( y )  Then A has n orthogonal unit eigenvectors w 1 …w n that form a basis (coordinate system) with eigenvalues  1 ...  n (|  i |  |  i+1 |) 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

 Write x in coordinate system w 1  Write x in coordinate system w 1 …w n w x=  i  i w i  x has coordinates (  1 ,…,  n ) x has coordinates (  1 ,…,  n )  Suppose:  1 ...  n (|  1 |  |  2 |  …  |  n |)  k  ) =   k  w  A k x = (  k  (  1  1 ,  2  2 ,….,  n  n )   i  i w i  k   A x  As k  , if we normalize A k x   1  1 w 1 A x   1  1 w 1 (all other coordinates  0)  So authority a is eigenvector of M T M associated with largest eigenvalue  1  l t i l 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

The web in 1839  A vote from an important  A vote from an important y/2 page is worth more y y  A page is important if it is  A page is important if it is pointed to by other a/2 y/2 important pages important pages m a m  Define a “rank” r j for node j a/2 r should be proportional to: r j should be proportional to: Flow equations: r   y = y /2 + a /2  r i j j a = y /2 + m /2 outdegree of i  j i m = a /2 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

 Stochastic adjacency matrix M Stochastic adjacency matrix M  Let page j has d j out ‐ links  If j → i , then M ij = 1/ d j else M ij = 0 ij j ij  M is a column stochastic matrix  Columns sum to 1  Rank vector r : vector with 1 entry per page R k i h 1  r i is the importance score of page i  |r| = 1  |r| = 1  The flow equations can be written r = Mr 11/29/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 16

 Imagine a random web surfer:  Imagine a random web surfer:  At any time t , surfer is on some page u  At ti  At time t+1 , the surfer follows an out ‐ link t+1 th f f ll t li k from u uniformly at random  Ends up on some page v linked from u  Ends up on some page v linked from u  Process repeats indefinitely  Let:  Let:  p (t) … vector whose i th coordinate is the prob. that the surfer is at page i at time t prob. that the surfer is at page i at time t  p (t) is a probability distribution over pages 11/29/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 17

 Where is the surfer at time t+1 ?  Where is the surfer at time t+1 ?  Follows a link uniformly at random p (t+1) = Mp (t) p (t+1) = Mp (t)  Suppose the random walk reaches a state p (t+1) = Mp (t) = p (t) (t+1) M (t) (t)  then p (t) is stationary distribution of a random walk  Our rank vector r satisfies r = Mr O k i fi M  So it is a stationary distribution for the random surfer f 11/29/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 18

 Power Iteration:  Power Iteration:  Set r i =1 y a m  r j =  i r i /d i  y y /d y ½ ½ 0  And iterate a ½ 0 1 a m m 0 ½ 0  Example: y 1 1 1 1 5/4 5/4 9/8 9/8 6/5 6/5 a = 1 3/2 1 11/8 … 6/5 m 1 ½ ¾ ½ 3/5 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

 Some pages are “dead ends”  Some pages are dead ends (have no out ‐ links)  Such pages cause importance  Such pages cause importance to leak out  Spider traps (all out links are within the group) within the group)  Eventually spider traps absorb all importance 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

 Power Iteration:  Power Iteration: y a m  Set r i =1 y y ½ ½ 0  r j =  i r i /d i  /d a ½ 0 0  And iterate a m 0 ½ 0 m  Example: y 1 1 1 1 ¾ ¾ 5/8 5/8 0 0 a = 1 ½ ½ 3/8 … 0 m 1 ½ ¼ ¼ 0 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

y y a m  Power Iteration:  Power Iteration: y y y ½ ½ 0  Set r i =1 a ½ 0 0 a  r j =  i r i /d i  /d m m m 0 0 ½ ½ 1 1  And iterate  Example: y 1 1 1 1 ¾ ¾ 5/8 5/8 0 0 a = 1 ½ ½ 3/8 … 0 m 1 3/2 7/4 2 3 11/29/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

http://cs224w.stanford.edu How to organize/navigate it? How to - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it? First try: y Web directories

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu How to organize/navigate it? First try: Human curated Web

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link

CULTURAL ACCESS September 18, 2015 Trends Regulatory Non-regulatory The Future

Infrastructure as a Service (IaaS) Google Compute Engine AWS Elastic Compute Cloud (EC2) Azure

QoX: Quality of Service and Consumption in the Cloud Murad Kablan, Eric Keller , Hani Jamjoom

Algorithm for Service-Oriented Grid Laiping Zhao, Yizhi Ren, Mingchu Li, Kouichi Sakurai

1 Text Nave Bayes Algorithm Text Nave Bayes Algorithm (Train) (Test) Let V be the

Text Categorization P2P Security Datamining Semantic Web Case Studies: Nutch, Google,

Chapter 22 Envisioning Design Todd Knoll Overview Definition of Envisioning Design

http://www.mmds.org High dim. High dim. Graph Graph Infinite Infinite Machine Machine Apps