CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University Jure Leskovec Stanford University http://cs224w.stanford.edu
[LibenNowell ‐ Kleinberg ‘03] Link prediction task: Link prediction task: Given G[t 0 ,t 0 ’] a graph on edges up to time t 0 ’ output a ranked list L of links (not in G[t t ’] ) that output a ranked list L of links (not in G[t 0 ,t 0 ] ) that are predicted to appear in G[t 1 ,t 1 ’] Evaluation: n=|E new | : # new edges that appear during the test period [t 1 ,t 1 ’] Take top n elements of L and count correct edges 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
[LibenNowell ‐ Kleinberg ‘03] Link prediction task: Link prediction task: Given G[t 0 ,t 0 ’] a graph on edges up to time t 0 ’ output a ranked list L of links (not in G[t t ’] ) that output a ranked list L of links (not in G[t 0 ,t 0 ] ) that are predicted to appear in G[t 1 ,t 1 ’] Evaluation: n=|E new | : # new edges that appear during the test period [t 1 ,t 1 ’] Take top n elements of L and count correct edges 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3
[LibenNowell ‐ Kleinberg ‘03] Predict links evolving collaboration network Predict links evolving collaboration network Core: Since network data is very sparse Consider only nodes with in ‐ degree and out ‐ degree of at least 3 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4
[LibenNowell ‐ Kleinberg ‘03] Γ (x) … degree of node x For every pair of nodes (x,y) compute: For every pair of nodes (x,y) compute: Sort the pairs by score and predict top n pairs as new links di t t i li k Γ (x) … degree of node x 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5
[LibenNowell ‐ Kleinberg ‘03] Rank potential links (x,y) based on: Rank potential links (x,y) based on: Γ (x) … degree of node x 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6
12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7
[LibenNowell ‐ Kleinberg ’ 03] 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8
Improvement over #common neighbors 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9
Recommend a list of possible friends Recommend a list of possible friends Supervised machine learning setting: Training example: For every node s have a list of nodes she will create links to {v 1 , …, v k } Problem: Problem: Learn a model that will for a given node s rank nodes {v 1 , …, v k } higher than other nodes in the network than other nodes in the network How to combine node/edge attributes and network structure? Let’s learn how to bias random walks! 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10
[WSDM ’11] v 1 v 1 Let s be the center node v 2 v 2 Let f w (u,v) be a function that assigns a strength to each edge: a uv = f w (u,v) = exp(-w Ψ uv ) f ( ) ( Ψ ) s s Ψ uv is a feature vector Features of node u Features of node u v 3 v 3 Features of node v positive examples Features of edge (u,v) negative examples negative examples w is the parameter vector we want to learn Do a random walk from s where transitions are according to edge strengths di t d t th How to learn f w (u,v) ? 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11
[WSDM ’11] v 1 v 1 v 2 v 2 Random walk transition matrix: Random walk transition matrix: 2 s PageRank transition matrix: g v 3 v 3 with prob. α jump back to s Compute PageRank vector: p=p T Q Rank nodes by p Rank nodes by p u 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12
[WSDM ’11] v 1 v 1 Each node u has a score p Each node u has a score p u v 2 v 2 2 Destination nodes D={v 1 ,…, v k } No ‐ link nodes L={the rest} No ‐ link nodes L={the rest} s What do we want? v 3 v 3 Hard constraints, make them soft 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13
[WSDM ’11] v 1 v 1 v 2 v 2 Want to minimize: Want to minimize: 2 s Loss: h(x)=0 if x<0 , x 2 else How to minimize F ? How to minimize F ? p l and p d depend on w : v 3 v 3 Given w assign edge weights a =f (u v) Given w assign edge weights a uv f w (u,v) Using transition matrix Q=[a uv ] compute PageRank scores p PageRank scores p u Want to set w such that p l <p d 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14
[WSDM ’11] How to minimize F? v 1 v 1 v 2 v 2 2 Take the derivative! s We know: v 3 v 3 i.e. So: Looks like the PageRank equation! 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15
[WSDM ’11] v 1 v 1 v 2 v 2 Iceland Facebook network Iceland Facebook network 2 174,000 nodes (55% of population) A Avg. degree 168 d 168 s Avg. person added 26 new friends/month For every node s : For every node v 3 v 3 Positive examples: D={ new friendships of s in Nov ‘09 } D { ‘09 } f i d hi f i N Negative examples: L { th L={ other nodes s did not create new links to } li k t } d did t t 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16
Node and Edge features for learning: g g Node: Age Gender Degree Edge: Age of an edge Communication, C i ti Profile visits Co ‐ tagged photos Baselines: Baselines: Decision trees and logistic regression: Above features + 10 network features (PageRank, common friends) Evaluation: AUC and precision at Top20 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17
Facebook: Facebook: predicting future friends friends 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18
Arxiv Hep Ph Arxiv Hep ‐ Ph collaboration network network 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19
Results: Results: 2.3X improvement over 2.3x previous FB ‐ PYMK system previous FB ‐ PYMK system How to scale to FB size? FB network: >500 million people, >65 billion edges 40 machines, each 72GB of RAM (total 2.8TB) System makes 8.6 million suggests per second y gg p 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20
Many social or information networks are implicit or Many social or information networks are implicit or hard to observe: Hidden/hard ‐ to ‐ reach populations: Network of needle sharing between drug injection users k f dl h b d Implicit connections: Network of information propagation in online news media But we can observe results of the processes taking place on such (invisible) networks: Virus propagation: Drug users get sick, and we observe when they see the doctor Information networks: Information networks: We observe when media sites mention information Question: Can we infer the hidden networks? 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21
There is a directed social network over which diff diffusions take place: i t k l a a b b b d d c c e e e But we do not observe the edges of the network We only see the time when a node gets infected : We only see the time when a node gets infected : Cascade c 1 : (a, 1), (c, 2), (b, 6), (e, 9) Cascade c 2 : (c, 1), (a, 4), (b, 5), (d, 8) 2 ( , ), ( , ), ( , ), ( , ) Task: inferring the underlying network 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22
Word of mouth & Word of mouth & Virus propagation Viral marketing Viruses propagate p p g Recommendations and Recommendations and Process Process through the network influence propagate We only observe when We only observe when We observe We observe people get sick people get sick people buy products But NOT who infected But NOT who influenced It’s hidden whom whom whom h Can we infer the underlying network? 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23
Recommend
More recommend