Maximizing the Spread of Maximizing the Spread of I nfluence through a Social I nfluence through a Social Network Network By David Kempe, Jon Kleinberg, By David Kempe, Jon Kleinberg, Eva Tardos Eva Tardos Report by Joe Abrams Report by Joe Abrams
Social Networks Social Networks
Infectious disease networks Infectious disease networks
Viral Marketing Viral Marketing
Viral Marketing Viral Marketing • Example: • Example: Hotmail Hotmail • Included service • Included service’ ’s URL in every email sent s URL in every email sent by users by users • Grew from zero to 12 million users in 18 • Grew from zero to 12 million users in 18 months with small advertising budget months with small advertising budget
Domingos and Richardson Domingos and Richardson (2001, 2002) (2001, 2002) • Introduction to maximization of influence • Introduction to maximization of influence over social networks over social networks • Intrinsic Value vs. Network Value • Intrinsic Value vs. Network Value • Expected Lift in Profit (ELP) • Expected Lift in Profit (ELP) • Epinions, • Epinions, “ “web of trust web of trust” ”, 75,000 users and , 75,000 users and 500,000 edges 500,000 edges
Domingos and Richardson Domingos and Richardson (2001, 2002) (2001, 2002) • Viral marketing (using greedy hill • Viral marketing (using greedy hill- -climbing climbing strategy) worked very well compared with strategy) worked very well compared with direct marketing direct marketing • Robust (69% of total lift knowing only 5% • Robust (69% of total lift knowing only 5% of edges) of edges)
Diffusion Model: Linear Diffusion Model: Linear Threshold Model Threshold Model • Each node (consumer) influenced by set • Each node (consumer) influenced by set of neighbors; has threshold Θ Θ from from of neighbors; has threshold uniform distribution [0,1] uniform distribution [0,1] • When combined influence reaches • When combined influence reaches threshold, node becomes “ “active active” ” threshold, node becomes • Active node now can influence its • Active node now can influence its neighbors neighbors • Weighted edges • Weighted edges
Diffusion Model: Linear Diffusion Model: Linear Threshold Model Threshold Model
Diffusion Model: Independent Diffusion Model: Independent Cascade Model Cascade Model • Each active node has a probability • Each active node has a probability p p of of activating a neighbor activating a neighbor • At time • At time t t +1, all newly activated nodes try +1, all newly activated nodes try to activate their neighbors to activate their neighbors • Only one attempt for per node on target • Only one attempt for per node on target • Akin to turn • Akin to turn- -based strategy game? based strategy game?
Influence Maximization Influence Maximization • Using greedy hill • Using greedy hill- -climbing strategy, can climbing strategy, can approximate optimum to within a factor of approximate optimum to within a factor of (1 – – 1/e 1/e – – ε ε ), or ~63% ), or ~63% (1 • Proven using theories of submodular • Proven using theories of submodular functions (diminishing returns) functions (diminishing returns) • Applies to both diffusion models • Applies to both diffusion models
Testing on network data Testing on network data • Co • Co- -authorship network authorship network • High • High- -energy physics theory section of energy physics theory section of www.arxiv.org www.arxiv.org • 10,748 nodes (authors) and ~53,000 • 10,748 nodes (authors) and ~53,000 edges edges • Multiple co • Multiple co- -authored papers listed as authored papers listed as parallel edges (greater weight) parallel edges (greater weight)
Testing on network data Testing on network data • Linear Threshold: influence weighed by # • Linear Threshold: influence weighed by # of parallel lines, inversely weighed by of parallel lines, inversely weighed by degree of target node: w = c u,v /d v degree of target node: w = c u,v /d v • Independent Cascade: • Independent Cascade: p p set at 1% and set at 1% and 10%; total probability for u v u v is is 10%; total probability for 1 – – (1 (1 – – p p )^c )^c u,v 1 u,v • Weighted Cascade: • Weighted Cascade: p p = 1/ d = 1/ d v v
Algorithms Algorithms • Greedy hill • Greedy hill- -climbing climbing • High degree: nodes with greatest number • High degree: nodes with greatest number of edges of edges • Distance centrality: lowest average • Distance centrality: lowest average distance with other nodes distance with other nodes • Random • Random
Algorithms Algorithms
Results: Linear Threshold Model Results: Linear Threshold Model Greedy: ~40% better than central, ~18% better than high degree
Results: Weighted Cascade Results: Weighted Cascade Model Model
Results: Independent Cascade, Results: Independent Cascade, p = 1% = 1% p
Results: Independent Cascade, Results: Independent Cascade, p = 10% = 10% p
Advantages of Random Selection Advantages of Random Selection
Generalized models Generalized models • Generalized Linear Threshold: for node • Generalized Linear Threshold: for node v v , , influence of neighbors not necessarily sum influence of neighbors not necessarily sum of individual influences of individual influences • Generalized Independent Cascade: for • Generalized Independent Cascade: for node v v , probability , probability p p depends on set of depends on set of v v ’ ’s s node neighbors that have previously tried to neighbors that have previously tried to activate v v activate • Models computationally equivalent, • Models computationally equivalent, impossible to guarantee approximation impossible to guarantee approximation
Non- -Progressive Threshold Progressive Threshold Non Model Model • Active nodes can become inactive • Active nodes can become inactive • Similar concept: at each time • Similar concept: at each time t t , whether , whether or not v v becomes/stays active depends on becomes/stays active depends on or not if influence meets threshold if influence meets threshold • Can • Can “ “intervene intervene” ” at different times; need at different times; need not perform all interventions at t t = 0 = 0 not perform all interventions at • Answer to progressive model with graph G • Answer to progressive model with graph G equivalent to non- -progressive model with progressive model with equivalent to non layered graph G τ layered graph G τ
General Marketing Strategies General Marketing Strategies • Can divide up total budget • Can divide up total budget κ κ into equal into equal increments of size δ δ increments of size • For greedy hill • For greedy hill- -climbing strategy, can climbing strategy, can guarantee performance within factor of guarantee performance within factor of 1 – – e^[ e^[- -( ( κ κ * * γ γ )/( )/( κ κ + + δ δ * * n n )] )] 1 • As • As δ δ decreases relative to decreases relative to κ κ , result , result 1 = 63% approaches 1 – – e e - = 63% approaches 1 -1
Strengths of paper Strengths of paper • Showed results in two complementary • Showed results in two complementary fashions: theoretical models and test fashions: theoretical models and test results using real dataset results using real dataset • Demonstrated that greedy hill • Demonstrated that greedy hill- -climbing climbing strategy could guarantee results within strategy could guarantee results within 63% of optimum 63% of optimum • Used specific and generalized versions of • Used specific and generalized versions of two different diffusion models two different diffusion models
Weaknesses of paper Weaknesses of paper • Doesn • Doesn’ ’t fully explain methodology of t fully explain methodology of greedy hill- -climbing strategy climbing strategy greedy hill • Lots of work not shown • Lots of work not shown – – simply refers to simply refers to work done in other papers work done in other papers • Threshold value uniformly distributed? • Threshold value uniformly distributed? • Influence inversely weighted by degree of • Influence inversely weighted by degree of target? target?
Questions? Questions?
Recommend
More recommend