On the Submodularity of Influence in Social Networks Elchanan Mossel & Sebastien Roch STOC07 Speaker: Xinran He Xinranhe1990@gmail.com
Social Network • Social network as a graph – Nodes represent individuals. – Edges are social relations with different strengths: • Neighbors, Coworkers relation in real life Virtual Friendship in Facebook • Follower-Followee relations in Twitter •
Diffusion In Social Network • The adoption of new products can propagate in the social network Diffusion in the social network • Information, rumors, innovation, ......
Influence Maximization • Influence maximization: Find k people that generates the largest influence spread (i.e. expected number of activated nodes) [KKT 2003]
Linear Threshold Model • Given a social network with edge weight w uv and a set of Initially active individuals S as seed. • Every individual independently chooses a threshold Θ v uniformly in [0,1]. • At any step t later, still inactive nodes become activated if ∑ ≥ θ w uv v ∈ u N v where N v is the set of activated direct neighbors of v. • The diffusion ends when no more nodes are activated. • The influence spread σ (S)=E[|P end ||S], is the expected number of active nodes when the diffusion process ends.
Linear Threshold Example Inactive Node 0.6 Active Node Threshold 0.2 0.2 0.3 Active neighbors X 0.1 0.4 U 0.3 0.5 Stop! 0.2 0.5 w v Step 0 Step 1 Step 2 Step 3
Influence Maximization • Find a seed set S, |S| ≤ k, σ (S) is maximized. • Influence Maximization Problem is NP-hard under linear threshold model[Kempe et.al 2003]. • We have to solve it approximately. • Main tool for analysis Theorem : The greedy algorithm is a 1-1/e approximation for maximizing monotone and submodular set functions[Nemhauser/Wolsey 1978].
Submodular & Monotone • A set function f: 2 V R is monotone if ≤ ⊆ ⊆ ( ) ( ) , for all f S f T S T V • A set function f: 2 V R is submodular if + ≥ ∩ + ∪ ( ) ( ) ( ) ( ) f S f T f S T f S T ⊆ for all , S T V
Submodularity • A function set f is submodular if + ≥ ∩ + ∪ ⊆ ( ) ( ) ( ) ( ) , for all , f S f T f S T f S T S T V • Or equivalently ∪ − ≤ ∪ − ⊆ ⊆ ( { }) ( ) ( { }) ( ) , for all f T v f T f S v f S S T V • Submodularity can be considered as diminishing return property.
Submodularity: Examples • Maximum coverage problem: Given a collection of sets S ={S 1 ,…,S m } and a ⊆ ≤ number k, find , maximize σ (S’)= S ' , | ' | S S S k i ∈ ' S S σ is submodular . . i • The influence spread σ under the linear threshold model is submodular [Kempe et.al 2003]. Influence Maximization Problem under linear Threshold model can be solved approximately.
General Threshold Model ∑ ≥ θ Linear Threshold Model: w uv v ∈ u N f v (S) ≥ θ v v General Threshold Model: f v (S) : activation function of node v over S. S is the set of already activated nodes. • General Threshold model is generalization of many diffusion models: ∑ w Linear Threshold Model uv ∈ [KKT 2003] u N v ∏ − − 1 ( 1 p ) Independent Cascade Model uv f v (S)= ∈ u N [KKT 2003] v r ∏ ω 1 - (1 - p ( , S )) Decreasing Cascade Model i i - 1 v = [KKT 2005] i 1 … …
General Threshold Model(2) For Linear Threshold model, the influence spread σ (S) is submodular [KKT 2003]. Conjecture : Under the general threshold model with monotone and submodular f v , σ (S) is monotone and submodular [KKT 2003].
Main Result Theorem: Under the general threshold model with monotone and submodular f v , σ (S) is monotone and submodular [Mossel/Roch 2007]. Corollary: The greedy algorithm is a (1-1/e) approximation to solve the influence maximization problem under general threshold model.
Proof: General Idea(1) • By coupling four diffusion process: A ={A 0 =S,A 1 ,A 2 ,…,A end } B ={B 0 =T,B 1 ,B 2 ,…,B end } C ={C 0 =S ∩ T,C 1 ,C 2 ,…,C end } D ={D 0 =S ∪ T,D 1 ,D 2 ,…,D end } ⊆ ∩ ⊆ ∪ • Such that and C A B D A B t t t t t t
Proof: General Idea(2) ⊆ ∩ ⊆ ∪ If and C A B D A B A end B end t t t t t t + Then | | | | A B end end ≥ ∩ + ∪ | | | | A B A B end end end end ≥ + | | | | C D end end Then takin g expectatio n, we have σ + σ ≥ σ ∩ + σ ∪ ( ) ( ) ( ) ( ) S T S T S T
⊆ ∩ C A B t t t • Couple the four processes with the same thresholds θ v . ⊆ ⊆ • Show by induction. , C A C B t t t t = ∩ ⊆ = – Base Case: C S T S A 0 0 C ⊆ – Assume . A t t – For a node v still inactive at step t, we ≤ ( ) ( ) have . Therefore if v is activated in f C f A v t v t step t+1 in C, it must also be activated in A. ⇒ + ⊆ C A f v (C t ) f v (A t ) + 1 1 t t
⊆ ∪ : First Attempt D A B t t t • Let’s try the same coupling method ⊆ ∪ for . D A B t t t 1 2 1 2 1 2 0.3 0.3 0.3 0.3 0.3 0.3 Θ 3 =0.5 Θ 3 =0.5 Θ 3 =0.5 3 3 3 D A B
Antisense Coupling ⊆ ∪ ? • Then how could we keep D A B t t t • Intuitively, using ϴ for activation of S and 1- ϴ for activation of T will maximize their union.
Piecemeal Growth = ( 1 ) ( ) k Define ( ,..., ) as the the piecemeal growth diffusion P P S S ( 1 ) ( ) k process, where ,..., is a partition of seed set . S S S Grow S (1) Grow S (2) Grow S (k) …… Until it ends Until it ends Until it ends Add S (k) Add S (1) Add S (2) Lemma: The distribution over the activated node set at the end of original process with seed set S and the piecemeal growth process P(S (1) ,…,S (k) ) is identical.
Piecemeal Growth: Proof • By coupling three piecemeal growth processes T’, T, T’’ and original process S with same θ . Grow S Grow nothing Add S at stage 1 Add nothing at stage 2 Grow S (2) Grow S (1) Add S (1) at stage 1 Add S (2) at stage 2 Grow nothing Grow S Add nothing at stage 1 Add S at stage 2 ⊆ ⊆ = = ' ' ' and ' ' ' T T T T T S s s s end end end = so that S T end end
Need-to-know Representation(1) • Consider the diffusion in a different way: Need-to-know Representation . • Principle of Deferred Decisions: We don’t decide all thresholds at the beginning; instead we reveal the value of thresholds whenever needed. • For example: if node v is inactive at step t-1, we only want to know whether it is activated at step t. Θ v Θ v f v (S t-2 ) f v (S t-1 )
Need-to-know Representation(2) Lemma: The following process is equivalent to the original one: = 1.Initiali ze S S 0 ≤ ≤ − = 2. At step 1 1 , we initialize and for each still inactive node t n S S v − 1 t t − ( ) ( ) f S f S − − 1 2 v t v t - With probabilit y , becomes activated v − 1 ( ) f S − 2 v t θ and we pick uniformly in [ ( ), ( )]. f S f S − − v 2 1 v t v t - Otherwise we do nothing. − 1 ( ) f v S − 2 t f v (S t-2 ) f v (S t-1 ) − ( ) ( ) f S f S − − v t 1 v t 2
Antisense Coupling(1) = ( 1 ) ( ) k Define the antisense diffusion ( ,..., ; ) P P S S T ( 1 ) ( ) k where ,..., is a partition of seed set . S S S Grow S (1) Grow S (k) Grow T …… Until it ends Until it ends Until it ends Add T at the K stage piecemeal growth beginning of τ stage k+1 Any step t in the final stage, activate nodes ≥ + − θ under the condition . ( ) ( ) 1 f P f P τ v t v v
Antisense Coupling(2) Grow S (1) …… Grow S(k) Grow T Grow S (1) …… Grow S(k) Grow T τ ≥ θ θ θ ( ) f v P t v f v (P t ) f v (P τ ) Θ ’ Θ ’ v =f v (P τ )+1- Θ v ≥ θ ' ( ) f v (Q t ) f v P f v (Q τ ) t v
Antisense Coupling(3) Grow S (1) …… Grow S(k) Grow T Grow S (1) …… Grow S(k) Grow T t τ Lemma: The distributions over the activated node set at the end of the piecemeal growth process P(S (1) ,…,S (k) ;T) and the antisense diffusion process Q(S (1) ,…,S (k) ;T) are identical.
Antisense Coupling: Proof(1) Grow S (1) …… Grow S(k) Grow T Grow S (1) …… Grow S(k) Grow T t τ • From Need-to-know Representation point of view: = τ For any node still inactive at time , we have v t θ = uniformly distribute d in [ ( ), 1 ] [ ( ), 1 ] f P f Q τ τ v v v
Antisense Coupling: Proof(2) • Then for any still inactive node, we pick its Θ v uniformly in [f v (P τ ),1]. • We define Θ ’ v =f v (Q τ )+1- Θ v . • Since Θ v and Θ ’ v have the same distribution, the final stage in growing T in P and Q is identical. • Therefore P end and Q end have the same distribution.
Coupling: Overview Grow S ∩ T Grow S\T Grow nothing Until it ends Until it ends Grow S ∩ T Grow T\S Grow Nothing Until it ends Until it ends Grow S ∩ T Grow S\T Grow T\S Until it ends Until it ends Until it ends ⊆ ∪ for any step t in all three stages D A B t t t
Recommend
More recommend