http cs224w stanford edu
play

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University Jure Leskovec Stanford University http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link prediction task: Given G[t 0 ,t


  1. CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University Jure Leskovec Stanford University http://cs224w.stanford.edu

  2. [LibenNowell ‐ Kleinberg ‘03]  Link prediction task:  Link prediction task:  Given G[t 0 ,t 0 ’] a graph on edges up to time t 0 ’ output a ranked list L of links (not in G[t t ’] ) that output a ranked list L of links (not in G[t 0 ,t 0 ] ) that are predicted to appear in G[t 1 ,t 1 ’]  Evaluation:  n=|E new | : # new edges that appear during the test period [t 1 ,t 1 ’]  Take top n elements of L and count correct edges 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

  3. [LibenNowell ‐ Kleinberg ‘03]  Link prediction task:  Link prediction task:  Given G[t 0 ,t 0 ’] a graph on edges up to time t 0 ’ output a ranked list L of links (not in G[t t ’] ) that output a ranked list L of links (not in G[t 0 ,t 0 ] ) that are predicted to appear in G[t 1 ,t 1 ’]  Evaluation:  n=|E new | : # new edges that appear during the test period [t 1 ,t 1 ’]  Take top n elements of L and count correct edges 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

  4. [LibenNowell ‐ Kleinberg ‘03]  Predict links evolving collaboration network  Predict links evolving collaboration network  Core: Since network data is very sparse  Consider only nodes with in ‐ degree and out ‐ degree of at least 3 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

  5. [LibenNowell ‐ Kleinberg ‘03] Γ (x) … degree of node x  For every pair of nodes (x,y) compute: For every pair of nodes (x,y) compute:  Sort the pairs by score and predict top n pairs as new links di t t i li k Γ (x) … degree of node x 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

  6. [LibenNowell ‐ Kleinberg ‘03]  Rank potential links (x,y) based on: Rank potential links (x,y) based on: Γ (x) … degree of node x 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

  7. 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

  8. [LibenNowell ‐ Kleinberg ’ 03] 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

  9.  Improvement over #common neighbors 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

  10.  Recommend a list of possible friends Recommend a list of possible friends  Supervised machine learning setting:  Training example:  For every node s have a list of nodes she will create links to {v 1 , …, v k }  Problem: Problem:  Learn a model that will for a given node s rank nodes {v 1 , …, v k } higher than other nodes in the network than other nodes in the network  How to combine node/edge attributes and network structure?  Let’s learn how to bias random walks! 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

  11. [WSDM ’11] v 1 v 1  Let s be the center node v 2 v 2  Let f w (u,v) be a function that assigns a strength to each edge: a uv = f w (u,v) = exp(-w Ψ uv ) f ( ) ( Ψ ) s s  Ψ uv is a feature vector  Features of node u  Features of node u v 3 v 3  Features of node v positive examples  Features of edge (u,v) negative examples negative examples  w is the parameter vector we want to learn  Do a random walk from s where transitions are according to edge strengths di t d t th  How to learn f w (u,v) ? 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

  12. [WSDM ’11] v 1 v 1 v 2 v 2  Random walk transition matrix:  Random walk transition matrix: 2 s  PageRank transition matrix: g v 3 v 3  with prob. α jump back to s  Compute PageRank vector: p=p T Q  Rank nodes by p  Rank nodes by p u 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

  13. [WSDM ’11] v 1 v 1  Each node u has a score p  Each node u has a score p u v 2 v 2 2  Destination nodes D={v 1 ,…, v k }  No ‐ link nodes L={the rest}  No ‐ link nodes L={the rest} s  What do we want? v 3 v 3  Hard constraints, make them soft 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

  14. [WSDM ’11] v 1 v 1 v 2 v 2  Want to minimize:  Want to minimize: 2 s  Loss: h(x)=0 if x<0 , x 2 else  How to minimize F ? How to minimize F ? p l and p d depend on w : v 3 v 3  Given w assign edge weights a =f (u v) Given w assign edge weights a uv f w (u,v)  Using transition matrix Q=[a uv ] compute PageRank scores p PageRank scores p u  Want to set w such that p l <p d 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

  15. [WSDM ’11]  How to minimize F? v 1 v 1 v 2 v 2 2  Take the derivative! s  We know: v 3 v 3 i.e.  So:  Looks like the PageRank equation! 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

  16. [WSDM ’11] v 1 v 1 v 2 v 2  Iceland Facebook network  Iceland Facebook network 2  174,000 nodes (55% of population)  A  Avg. degree 168 d 168 s  Avg. person added 26 new friends/month  For every node s : For every node v 3 v 3  Positive examples:  D={ new friendships of s in Nov ‘09 } D { ‘09 } f i d hi f i N  Negative examples:  L { th  L={ other nodes s did not create new links to } li k t } d did t t 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

  17.  Node and Edge features for learning: g g  Node:  Age  Gender  Degree  Edge:  Age of an edge  Communication, C i ti  Profile visits  Co ‐ tagged photos  Baselines: Baselines:  Decision trees and logistic regression:  Above features + 10 network features (PageRank, common friends)  Evaluation:  AUC and precision at Top20 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

  18.  Facebook:  Facebook: predicting future friends friends 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

  19.  Arxiv Hep Ph  Arxiv Hep ‐ Ph collaboration network network 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

  20.  Results:  Results:  2.3X improvement over 2.3x previous FB ‐ PYMK system previous FB ‐ PYMK system  How to scale to FB size?  FB network:  >500 million people, >65 billion edges  40 machines, each 72GB of RAM (total 2.8TB)  System makes 8.6 million suggests per second y gg p 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

  21.  Many social or information networks are implicit or Many social or information networks are implicit or hard to observe:  Hidden/hard ‐ to ‐ reach populations:  Network of needle sharing between drug injection users k f dl h b d  Implicit connections:  Network of information propagation in online news media  But we can observe results of the processes taking place on such (invisible) networks:  Virus propagation:  Drug users get sick, and we observe when they see the doctor  Information networks: Information networks:  We observe when media sites mention information  Question: Can we infer the hidden networks? 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

  22.  There is a directed social network over which diff diffusions take place: i t k l a a b b b d d c c e e e  But we do not observe the edges of the network   We only see the time when a node gets infected : We only see the time when a node gets infected :  Cascade c 1 : (a, 1), (c, 2), (b, 6), (e, 9)  Cascade c 2 : (c, 1), (a, 4), (b, 5), (d, 8) 2 ( , ), ( , ), ( , ), ( , )  Task: inferring the underlying network 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

  23. Word of mouth & Word of mouth & Virus propagation Viral marketing Viruses propagate p p g Recommendations and Recommendations and Process Process through the network influence propagate We only observe when We only observe when We observe We observe people get sick people get sick people buy products But NOT who infected But NOT who influenced It’s hidden whom whom whom h Can we infer the underlying network? 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

Recommend


More recommend