http cs224w stanford edu we are more influenced by our
play

http://cs224w.stanford.edu We are more influenced by our friends - PowerPoint PPT Presentation

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu We are more influenced by our friends than strangers 68% of consumers consult friends and family before purchasing home electronics 50% do


  1. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu

  2. ¡ We are more influenced by our friends than strangers ¨ 68% of consumers consult friends and family before purchasing home electronics ¨ 50% do research online before purchasing electronics 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2

  3. Identify influential customers Convince them to These customers adopt the product – endorse the product Offer discount or among their friends free samples 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 3

  4. ¡ Information epidemics: § Which are the influential users? § Which news sites create big cascades? § Where should we advertise? Which node shall we target? vs. 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 4

  5. ¡ Independent Cascade Model § Directed finite 𝑯 = (𝑾, 𝑭) § Set 𝑻 starts out with new behavior § Say nodes with this behavior are “ active ” § Each edge (𝒘, 𝒙) has a probability 𝒒 𝒘𝒙 § If node 𝒘 is active, it gets one chance to make 𝒙 active, with probability 𝒒 𝒘𝒙 § Each edge fires at most once ¡ Does scheduling matter? No § If 𝒗, 𝒘 are both active at the same time, it doesn’t matter which tries to activate 𝒙 first § But the time moves in discrete steps 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 5

  6. ¡ Initially some nodes S are active ¡ Each edge (𝒘, 𝒙) has probability (weight) 𝒒 𝒘𝒙 0.4 a d 0.4 0.2 0.3 0.3 0.2 0.3 b f f e 0.2 e h 0.4 0.4 0.3 0.2 0.3 0.3 g g i 0.4 c ¡ When node v becomes active: § It activates each out-neighbor 𝒙 with prob. 𝒒 𝒘𝒙 ¡ Activations spread through the network 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 6

  7. � 0.4 a d Problem: ( k is a user-specified parameter) 0.4 0.2 0.3 0.3 0.2 ¡ Most influential set of 0.3 b f 0.2 e size k : set S of k nodes h 0.4 0.4 0.3 0.2 producing largest 0.3 0.3 g i 0.4 expected cascade size f(S) c if activated [Domingos- Influence Influence set X a of a set X d of d Richardson ‘01] f ( S ) max ¡ Optimization problem: S of size k 𝑔 𝑇 = 1 Why “expected cascade size”? X a is a result of a random process. So in |𝐽| 2 𝑔 3 (𝑇) practice we would want to compute X a for many random realizations and then maximize the “average” value f(S ). For now let’s ignore this nuisance and Random simply assume that each node u influences a set of nodes X u realizations i 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 7

  8. ¡ S : is initial active set ¡ f(S) : The expected size of final active set § f(S) is the size of the union of X u : 𝒈(𝑻) = ∪ 𝒗∈𝑻 𝒀 𝒗 a b d … influence set X u of node u c graph G ¡ Set S is more influential if f(S) is larger 𝒈( 𝒃, 𝒄 ) < 𝒈({𝒃, 𝒅}) < 𝒈({𝒃, 𝒆}) 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 8

  9. ¡ Problem: Most influential set of k nodes: set S on k nodes producing largest expected cascade size f(S) if activated ¡ The optimization problem: f ( S ) max S of size k ¡ How hard is this problem? § NP-COMPLETE! § Show that finding most influential set is at least as hard as a set cover problem 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10

  10. ¡ Set cover problem (a known NP-complete problem) : § Given universe of elements 𝑽 = {𝒗 𝟐 , … , 𝒗 𝒐 } and sets 𝒀 𝟐 , … , 𝒀 𝒏 ⊆ 𝑽 X 3 X 1 U X 2 X 4 § Q: Are there k sets among X 1 ,…, X m such that their union is U ? ¡ Goal: f ( S ) Encode set cover as an instance of max S of size k 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 11

  11. ¡ Given a set cover instance with sets X 1 ,…, X m ¡ Build a bipartite “X-to-U” graph: Construction: • Create edge X 1 (X i ,u) " X i " u Î X i 1 e.g.: u 1 -- directed edge 1 X 1 = {u 1 , u 2 , u 3 } X 2 from sets to their u 2 1 X 3 elements u 3 • Put weight 1 on each edge (the activation is deterministic) u n X m ¡ Set Cover as Influence Maximization in X-to-U graph: There exists a set S of size k with f(S)=k+n iff there exists a size k set cover Note: Optimal solution is always a set of nodes X i (we never influence nodes “ u” ) This problem is hard in general, but there could be special cases that are easier. 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 12

  12. ¡ Extremely bad news: § Influence maximization is NP-complete ¡ Next, good news: § There exists an approximation algorithm! § For some inputs the algorithm won’t find globally optimal solution/set OPT § But we will also prove that the algorithm will never do too badly either. More precisely, the algorithm will find a set S that where f(S) > 0.63*g(OPT) , where OPT is the globally optimal set. 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 13

  13. ¡ Consider a Greedy Hill Climbing algorithm to find S : § Input: Influence set 𝒀 𝒗 of each node 𝒗: 𝒀 𝒗 = {𝒘 𝟐 , 𝒘 𝟑 , … } § That is, if we activate 𝒗 , nodes {𝒘 𝟐 , 𝒘 𝟑 , … } will eventually get active § Algorithm: At each iteration 𝒋 activate the node 𝒗 that gives largest marginal gain: 𝐧𝐛𝐲 𝒗 𝒈(𝑻 𝒋M𝟐 ∪ {𝒗}) 𝑇 𝑗 … Initially active set 𝑔(𝑇 3 ) … Size of the union of 𝑌 P , 𝑣 ∈ 𝑇 3 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 14

  14. Algorithm: d ¡ Start with 𝑻 𝟏 = { } b a ¡ For 𝒋 = 𝟐 … 𝒍 e c § Activate node 𝒗 that max 𝒈(𝑻 𝒋M𝟐 ∪ {𝒗}) f(S i-1 È {u}) § Let 𝑻 𝒋 = 𝑻 𝒋M𝟐 ∪ {𝒗} a ¡ Example: b § Eval. 𝑔 𝑏 , … , 𝑔({𝑓}) , pick argmax of them c § Eval. 𝑔 𝒆, 𝑏 , … , 𝑔({𝒆, 𝑓}) , pick argmax d § Eval. 𝑔(𝒆, 𝒄, 𝑏}), … , 𝑔({𝒆, 𝒄, 𝑓}) , pick argmax e 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 15

  15. ¡ Claim: Hill climbing produces a solution S where: f(S) ³ (1-1/e)*f(OPT) ( f(S)>0.63*f(OPT) ) [Nemhauser, Fisher, Wolsey ’78, Kempe, Kleinberg, Tardos ‘03] ¡ Claim holds for functions f(·) with 2 properties: § f is monotone: (activating more nodes doesn’t hurt) if S Í T then f (S) £ f (T) and f({})= 0 § f is submodular: (activating each additional node helps less) adding an element to a set gives less improvement than adding it to one of its subsets: " S Í T f(S È {u}) – f(S) ≥ f(T È {u}) – f(T) Gain of adding a node to a small set Gain of adding a node to a large set 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 16

  16. ¡ Diminishing returns: f(·) " S Í T f(T È {u}) f(T) f(S È {u}) f(S) Adding u to T helps less than adding it to S ! Set size |T|, |S| f(S È {u}) – f(S) ≥ f(T È {u}) – f(T) Gain of adding a node to a small set Gain of adding a node to a large set 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 17

  17. Also see the hangout posted on the course website.

  18. � ¡ We must show our f(·) is submodular: ¡ " S Í T f(S È {u}) – f(S) ≥ f(T È {u}) – f(T) Gain of adding a node to a small set Gain of adding a node to a large set ¡ Basic fact 1: § If 𝒈 𝟐 (𝒚), … , 𝒈 𝒍 (𝒚) are submodular , and 𝒅 𝟐 , … , 𝒅 𝒍 ≥ 𝟏 then 𝑮 𝒚 = ∑ 𝒅 𝒋 ] 𝒈 𝒋 𝒚 is also submodular 𝒋 (Non-negative combination of submodular functions is a submodular function) 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 19

  19. � f(S È {u}) – f(S) ≥ f(T È {u}) – f(T) ¡ " S Í T : Gain of adding u to a small set Gain of adding u to a large set ¡ Basic fact 2: A simple submodular function § Sets 𝒀 𝟐 , … , 𝒀 𝒏 § 𝒈 𝑻 = ⋃ 𝒀 𝒍 (size of the union of sets 𝒀 𝒍 , 𝒍 ∈ 𝑻 ) 𝒍∈𝑻 § Claim: 𝒈(𝑻) is submodular! T S The more sets you already u have the less new area a given set u will S Í T cover 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 20

  20. � 𝑔 𝑇 = 1 |𝐽| 2 𝑔 3 (𝑇) Random realizations i a ¡ Proof strategy: d § We will argue that b f influence maximization e h is an instance of the Set cover problem : g i c § Set cover problem: f(S) is the size of the union of nodes influenced by active set S § Note f(S) is “random” (a result of a random process) so we need to be a bit careful § Principle of deferred decision to the rescue! ¡ We will create many parallel universes and then average over them 10/23/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 21

Recommend


More recommend