CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu
Epidemic Model based on Random Trees (a variant of branching processes) Root node, “patient 0” A patient meets d other people Start of epidemic With probability q>0 infects each d subtrees of them Q: For which values of d and q does the epidemic run forever? Run forever: lim 𝑜→∞ 𝑄 𝑗𝑗𝑗𝑗𝑗𝑗𝑗𝑗 𝑗𝑜𝑗𝑗 > 0 𝑏𝑗 𝑗𝑗𝑒𝑗𝑒 𝑗 Die out: -- || -- = 0 10/23/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
p n = prob. there is an infected node at depth n We need: lim 𝑜→∞ 𝑒 𝑜 =? (based on q and d ) Need recurrence for p n 𝑒 𝑜 = 1 − 1 − 𝑟𝑒 𝑜−1 𝑒 No infected node at depth n lim 𝑜→∞ 𝑒 𝑜 = result of iterating f x = 1 − 1 − 𝑟𝑦 𝑒 Starting at x=1 (since p 1 =1) 10/24/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3
f(x) y=x y = f x Going to first fixed point When is this going to 0? x 1 What do we know about f(x)? f 0 = 0, f 1 = 1 f 1 = 1 − 1 − q d < 1 f ′ x = qd 1 − qx d−1 f ′ 0 = qd ∶ f ′ (x) is monotone decreasing on [0,1] 10/24/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4
f(x) y=x y = f x x 1 We need f(x) to be bellow y=x! f ′ 0 < 1 𝑜→∞ 𝑒 𝑜 = 0 ? to 𝑟𝑗 < 1 lim qd = expected # of people at we infect 10/24/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5
In this model nodes only go from inactive → active Can generalize to allow nodes to alternate between active and inactive state by: 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6
Generalizing to model to Virus Propagation 2 Parameters: (Virus) birth rate β : probability than an infected neighbor attacks (Virus) death rate δ: probability that an infected node heals Healthy Prob. δ N 2 Prob. β N 1 N Infected N 3 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8
General scheme for epidemic models: Each node can go through phases: Transition probs. are governed by model parameters S…susceptible E…exposed I…infected R…recovered Z…immune 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9
Node goes through phases S usceptible I nfected R ecovered Models chickenpox or plague: Once you heal, you can never get infected again Assuming perfect mixing network is a complete graph Number of nodes the model dynamics is time 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10
Susceptible-Infective-Susceptible (SIS) model Cured nodes immediately become susceptible Virus “strength”: s = β / δ Node state transition diagram: Infected by neighbor with prob. β Susceptible Infective Cured internally with prob. δ 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11
Models flu: Susceptible node I(t) becomes infected Number of nodes The node then heals and become susceptible again Assuming perfect mixing (complete graph): S(t) dS = − β + δ SI I dt time dI = β − δ S usceptible I nfected SI I dt 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12
SIS Model Epidemic threshold of a graph G is a value of t , such that: If virus strength s = β / δ < t the epidemic can not happen (it eventually dies out) Given a graph what is its epidemic threshold? 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13
[Wang et al. 2003] We have no epidemic if: Epidemic threshold (Virus) Death rate β / δ < τ = 1/ λ 1, A largest eigenvalue (Virus) Birth rate of adj. matrix A ► λ 1, A alone captures the property of the graph! 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14
[Wang et al. 2003] 10,900 nodes and 500 Oregon 31,180 edges β = 0.001 Number of Infected Nodes β / δ > τ 400 (above threshold) 300 200 β / δ = τ 100 (at the threshold) 0 β / δ < τ 0 250 500 750 1000 (below threshold) Time δ: 0.05 0.06 0.07 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15
Does it matter how many people are initially infected? 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16
10/24/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17
Blogs – Information epidemics Which are the influential/infectious blogs? Which blogs create big cascades? Viral marketing Who are the influencers? Where should I advertise? Disease spreading vs. Where to place monitoring stations to detect epidemics? 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18
Independent Cascade Model Directed finite G=(V,E) Set S starts out with new behavior Say nodes with this behavior are “active” Each edge (v,w) has a probability p vw If node v is active, it gets one chance to make w active, with probability p vw Each edge fires at most once Does scheduling matter? No E.g., u,v both active, doesn’t matter which fires first But the time moves in discrete steps 10/24/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19
Initially some nodes S are active Each edge (v,w) has probability (weight) p vw 0.4 a d 0.4 0.2 0.3 0.3 0.2 0.3 b f f 0.2 e e h 0.4 0.4 0.3 0.2 0.3 0.3 g g i 0.4 c When node v becomes active: It activates each out-neighbor w with prob. p vw Activations spread through the network 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20
S : is initial active set f(S) : the expected size of final active set graph G a b d c … influence set of a node Set S is more influential if f(S) is larger f({a,b} < f({a,c}) < f({a,d}) 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21
Problem: 0.4 a d 0.4 0.2 Most influential set of 0.3 0.3 0.2 size k : set S of k nodes 0.3 b f 0.2 e h producing largest 0.4 0.4 0.3 0.2 0.3 0.3 expected cascade size g i 0.4 c f(S) if activated Influence [Domingos-Richardson ‘01] set of b f ( S ) max Optimization problem: S of size k 10/20/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22
Most influential set of k nodes: set S on k nodes producing largest expected cascade size f(S) if activated The optimization problem: f ( S ) max S of size k How hard is this problem? NP-HARD! Show that finding most influential set is at least as hard as a vertex cover 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23
Vertex cover problem: Given universe of elements U={u 1 ,…,u n } and sets S 1 ,…, S m ⊆ U Are there k sets among S 1 ,…, S m such that their union is U? S 3 U S 1 S 2 S 4 Goal: f ( S ) Encode vertex cover as an instance of max S of size k 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24
Given a vertex cover instance with sets S 1 ,…, S m Build a bipartite “S-to-U” graph: S 1 Construction: 1 u 1 e.g.: • Create edge 1 S 2 (S i ,u) ∀ S i ∀ u ∈ S i S 1 ={u 1 , u 2 , u 3 } u 2 1 -- directed edge S 3 u 3 from sets to their elements • Put weight 1 on each edge u n S m There exists a set S of size k with f(S)=k+n iff there exists a size k set cover Note: Optimal solution is always a set of S i This is hard in general, could be special cases that are easier 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25
Bad news: Influence maximization is NP-hard Next, good news: There exists an approximation algorithm! Consider the Hill Climbing algorithm to find S: Input: Influence set of each node u = {v 1 , v 2 , … } If we activate u, nodes {v 1 , v 2 , … } will eventually get active Algorithm: At each step take the node u that gives best marginal gain: max f(S i-1 ∪ {u}) 10/13/2009 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26
Recommend
More recommend