Maximizing the Spread of Influence through a Social Network Han Wang Department of Computer Science ETH Zürich
Problem Example 1: Spread of Rumor 2012 = end! D A C E B F
Problem Example 2: Viral Marketing ezPad 1 beats iPad 3 D A C E B F
Problem Definition G: a social network (n nodes) Model: spread process S: initially active subset (k seeds ) 𝝉 𝑻 : #final active nodes ( achievement ) Task: Choose 𝑇 ∗ Goal: 𝜏 𝑇 ∗ = max 𝜏 𝑇 NP-Hard Realistic Goal: Approximate the maximum with a guarantee Choose S: 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇 ∗
Contents in This Talk G: a social network (n nodes) Model: spread process Two Models S: initially active subset (k seeds ) 𝝉 𝑻 : #final active nodes ( achievement ) Task: Choose 𝑇 ∗ Prove: Goal: 𝜏 𝑇 ∗ = max 𝜏 𝑇 NP-Hard Realistic Goal: Prove: Approximate the maximum with a guarantee Choose S: 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇 ∗
Model 1: Independent Cascade Model
Model 1: Cascade Model Each active node try to activate his neighbors D 𝑞 𝐷,𝐸 = 0.2 𝑞 𝑣,𝑤 1 − 𝑞 𝑣,𝑤 𝑞 𝐷,𝐹 = 0.8 C E Only a single chance 𝑞 𝐷,𝐺 = 0.6 F
Model 1: Cascade Model D 0.2 A 0.7 0.8 C 0.4 0.3 E 0.6 B F
Model 1: Cascade Model 𝑇 = 𝐵, 𝐷 , 𝜏 𝑇 = 5 D 0.2 A 0.7 0.8 C 0.4 0.3 E 0.6 B F
Model 2: Linear Threshold Model
Model 2: Threshold Model Each inactive node picks a random 𝜄 𝑤 ∈ ,0,1- Active condition: 𝑐 𝑣,𝑤 ≥ 𝜄 𝑤 𝑣: 𝑏𝑑𝑢𝑗𝑤𝑓 𝑜𝑓𝑗𝑖𝑐𝑝𝑠 𝑝𝑔 𝑤 𝜾 𝑬 = 𝟏. 𝟒 Iteration 2: 0.2 < 0.3 𝑐 𝐷,𝐸 = 0.2 D 𝑐 𝐹,𝐸 = 0.7 Iteration 4: E active C E Iteration 5: 0.2+0.7 > 0.3 D active
Model 2: Threshold Model 𝜾 = 𝟏. 𝟒 Iteration: 1 2 D 0.2 A 0.7 0.8 C 0.4 0.3 E 𝜾 = 𝟏. 𝟕 0.6 B 𝜾 = 𝟏. 𝟔 F 𝜾 = 𝟏. 𝟘
Model 2: Threshold Model 𝑇 = 𝐵, 𝐷 , 𝜏 𝑇 = 4 D 0.2 A 0.7 0.8 C 0.4 0.3 E 0.6 B F
How to Prove the Guarantee? ??? find 𝑇 , s.t. Given a 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇 ∗ spread model find 𝑇 , s.t. f 𝑇 ≥ (1 − 1 𝑓) ∙ 𝑔 𝑇 ∗ Nemhauser f(S): Non-negative monotone submodular
Submodularity 𝑉 : a finite ground set 𝑄 𝑉 : power set of 𝑉 𝑔 ∙ : 𝑄 𝑉 → 𝑆 ∗ Submodularity: ∀ 𝑜𝑝𝑒𝑓 𝑤, ∀𝑇 ⊆ 𝑈 𝒈 𝐓 ∪ 𝒘 − 𝒈 𝑻 ≥ 𝒈 𝑼 ∪ 𝒘 − 𝒈 𝑼
Example: Submodularity 𝒈 𝑻 : number of vertexes reachable from vertexes in S v v A A C C D D B B
How to Prove the Guarantee? ??? find 𝑇 , s.t. Given a 𝜏 𝑇 ≥ 𝑠 ∙ 𝜏 𝑇 ∗ spread model find 𝑇 , s.t. f 𝑇 ≥ (1 − 1 𝑓) ∙ 𝑔 𝑇 ∗ Prove: 𝛕 𝑻 is Submodular Nemhauser f(S): Non-negative monotone submodular
We Want to Prove… 𝛕 𝑇 is Model NP-hard Submodular Independent Cascade Linear Threshold
Prove: Submodularity Cascade Model
Submodularity (Cascade Model) Recall: flip coin D 0.2 A 0.7 0.8 C 0.4 0.3 E 0.6 B F
Submodularity (Cascade Model) Why not flip all the coins in the begining? D 0.2 A 0.7 0.8 C 0.4 0.3 E 0.6 B F
Submodularity (Cascade Model) Live edges live paths blocked edges D 0.2 A 0.7 0.8 C 0.4 0.3 E 0.6 B F
Simplify Cascade Model Node v ends up active A live path: some seed v
Achievement(Simplified Model) D X: coin flipping outcome A e.g. X1, X2 C E 𝑆 𝑌 𝑤 B F 𝑆 𝑌1 𝐵 = 𝐵, 𝐶 𝑆 𝑌1 𝐷 = 𝐷, 𝐸, 𝐹 D A C 𝜏 𝑌 𝑇 = | 𝑆 𝑌 𝑤 | E 𝑤∈𝑇 𝜏 𝑌1 *𝐵, 𝐷+ = 𝐵, 𝐶, 𝐷, 𝐸, 𝐹 = 5 B F
Submodularity (Cascade Model) Fix x, 𝜏 𝑌 𝑇 is submodular Linear combination of submodular functions is still submodular 𝜏 𝑇 = 𝑄𝑠𝑝𝑐 𝑌 ∙ 𝜏 𝑌 𝑇 𝑌
Summary of the proof Active = Has a live path 𝜏 𝑌 𝑇 is submodular 𝜏 𝑇 is submodular
Prove: NP-hard Simplified Cascade Model
NP-Hard (Cascade Model) Set Cover Problem: k subsets cover all? K=1: No K=2: No K=3: Yes K=4: …
NP-Hard (Cascade Model) Influence maximization Solve Set Cover Q: 𝑇 = 2, 𝜏 𝑇 ≥ 2 + 5 ? Q: 2 subsets cover all ? S2 A A S1 B C S1 C S2 B D D S3 S3 E E
NP-Hard (Cascade Model) Influence Maximization Problem is at least as difficult as Set Cover Problem
Prove: Submodularity Linear Threshold Model
Recall: Threshold Model 𝜾 = 𝟏. 𝟒 D 0.2 A 0.7 0.8 C 0.4 0.3 E 𝜾 = 𝟏. 𝟕 0.6 B 𝜾 = 𝟏. 𝟔 F 𝜾 = 𝟏. 𝟘
Gamble: Roulette
Gamble: Roulette N1 None N1 0.2 N6 0.14 N6 0.15 v N2 0.1 N5 N2 N5 0.07 0.23 N3 N3 N4 N4 𝜾 = 𝟏. 𝟓
Submodularity (Threshold Model) None 𝜾 = 𝟏. 𝟒 C D 0.2 E A 0.7 0.8 C 0.4 0.3 E 𝜾 = 𝟏. 𝟕 0.6 None B A 𝜾 = 𝟏. 𝟔 None C F None 𝜾 = 𝟏. 𝟘 C
Submodularity (Threshold Model) 𝜾 = 𝟏. 𝟒 Live edges live paths D 0.2 A 0.7 0.8 C 0.4 0.3 E 𝜾 = 𝟏. 𝟕 0.6 B 𝜾 = 𝟏. 𝟔 F 𝜾 = 𝟏. 𝟘
Correctness of Simplification 𝐺𝑝𝑠 𝑜𝑝𝑒𝑓 𝑤: 𝑄 𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜 𝑢 + 1 𝑗𝑜𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜𝑡 ≤ 𝑢) = 𝑄(𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜 𝑢 + 1) 𝑄(𝑗𝑜𝑏𝑑𝑢𝑗𝑤𝑓 𝑗𝑜 𝐽𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜𝑡 ≤ 𝑢)
Simplified Model Active before iteration 5 becomes active in iteration 5 None N1 N1 0.2 N6 0.14 N6 0.15 v N2 0.1 N5 N2 N5 0.07 0.23 N3 N3 N4 N4
Simplified Model 𝐵 𝑢 : Nodes becoming active in iteration t 𝑐 𝑣,𝑤 𝑣∈𝐵 𝑢 1 − 𝑐 𝑣,𝑤 𝑣∈ 𝐵 1 ∪𝐵 2 ∪⋯∪𝐵 𝑢−1
Original Model N2 N6 N4 N3 N1 N5 None N1 0.2 N6 0.14 0.15 v N2 0.1 N5 0.07 0.23 N3 N4
Original Model 𝐵 𝑢 : Nodes becoming active in iteration t 𝑐 𝑣,𝑤 𝑣∈𝐵 𝑢 1 − 𝑐 𝑣,𝑤 𝑣∈ 𝐵 1 ∪𝐵 2 ∪⋯∪𝐵 𝑢−1
Simplify Threshold Model Node v ends up active A live path: some seed v
Similarly, we have… Active = Has a live path 𝜏 𝑌 𝑇 is submodular 𝜏 𝑇 is submodular
Prove: NP-hard Linear Threshold Model
NP-Hard (Threshold Model) Vertex Cover Problem k vertexes (S) each edge is incident to at least one vertex in S
NP-Hard (Threshold Model) Influence maximization Vertex Set Cover Q: 𝑇 = 3, 𝜏 𝑇 = 6 ? Q: 3 vertexes cover all ? D D A A C C E E B B F F
Influence Maximization Q: 𝑇 = 2, 𝜏 𝑇 = 6 ? Q: 𝑇 = 3, 𝜏 𝑇 = 6 ? D D A A C C E E B B F F
NP-Hard (Threshold Model) Influence Maximization Problem is at least as difficult as Vertex Cover Problem
End of Proofs Influence Maximization Problem 𝛕 𝑇 is Model NP-hard Submodular Independent Cascade Linear Threshold
Initial Problem find 𝑇 , s.t. Given a 𝜏 𝑇 ≥ (1 − 1 𝑓 − 𝝑) ∙ 𝜏 𝑇 ∗ spread model find 𝑇 , s.t. f 𝑇 ≥ (1 − 1 𝑓) ∙ 𝑔 𝑇 ∗ Prove: 𝛕 𝑻 is Submodular Greedy Hill Climbing 𝑵𝑩𝒀 𝒘 𝒈 𝐓 ∪ 𝒘 − 𝒈 𝑻 f(S): (Maximize Marginal Gain) Non-negative monotone submodular
Summary Problem Description Two Models Independent Cascade Model Linear Threshold Model Submodular Functions Proof of Approximation Guarantee Proof of NP-Hardness
Q&A
Recommend
More recommend