Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr Sankowski, Karol Węgrzycki , Piotr Wygocki University of Warsaw Paper: bit.ly/why-cascades WWW 2017
Information Cascade
Cascades as Graphs ◮ Given a Social Network ◮ The process of spreading the information generates a graph (a DAG) (a) Social Network (b) Cascade
Cascade as Graphs Cascade ⇐ ⇒ Propagation Graph
Cascade as Graphs Cascade ⇐ ⇒ Propagation Graph Cascade size ⇐ ⇒ Rumour Popularity
Cascade Generation Model (Leskovec et al. 2007) a a newly informed = { a } informed = {} c b e d g f
Cascade Generation Model (Leskovec et al. 2007) a a newly informed = { a , b } informed = {} c b b e d g f
Cascade Generation Model (Leskovec et al. 2007) a a newly informed = { a , b , d } informed = {} c b b e d d g f
Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d } informed = { a } c b b b e d d d g f
Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d , c } informed = { a } c c b b b e d d d g f
Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d , c } informed = { a } c c b b b e e d d d g f
Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d , c , f } informed = { a } c c b b b e e d d d g f f
Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { c , f } informed = { a , b , d } c c b b b b e e d d d d g f f
Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { c , f } informed = { a , b , d } c c c b b b b e e d d d d g f f f
Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { c , f , g } informed = { a , b , d } c c c b b b b e e d d d d g g f f f
Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { g } informed = { a , b , d , c , f } c c c c b b b b e e d d d d g g g f f f f
Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = {} informed = { a , b , d , c , f , g } c c c c b b b b e e d d d d g g g g f f f f
Cascade Generation Model (Leskovec et al. 2007) Successful predicts small cascade sizes ◮ Over 650 citation ◮ Lots of practical applications
Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])
Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014]) ◮ Log-log plots
Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014]) ◮ Log-log plots ◮ X-axis - cascade size
Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014]) ◮ Log-log plots ◮ X-axis - cascade size ◮ Y-axis - probability of occurrence of cascade of such size
Problem Possible explanations why this problem occurs:
Problem Possible explanations why this problem occurs: ◮ Large number of Strongly Connected Components in a graph (Lee et al. PAKDD 2014) ◮ Burstiness of human behaviour (Mathews et al. WWW 2017) ◮ Time and space effects (Cui et al. CIKM 2014) ◮ Preferential attachment ◮ Many others!
Problem Possible explanations why this problem occurs: ◮ Large number of Strongly Connected Components in a graph (Lee et al. PAKDD 2014) ◮ Burstiness of human behaviour (Mathews et al. WWW 2017) ◮ Time and space effects (Cui et al. CIKM 2014) ◮ Preferential attachment ◮ Many others! Only empirical you need a social network to generate a random cascade
What do we model? ◮ We propose model that outputs the probability of occurrence of the cascade.
What do we model? ◮ We propose model that outputs the probability of occurrence of the cascade. ◮ For example, first with 2%, second with 1%, etc.
What do we model? ◮ We propose model that outputs the probability of occurrence of the cascade. ◮ For example, first with 2%, second with 1%, etc. ◮ And parameters to adjust to the social network
What do we model? (small analogy) Recap ◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows power-law ◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is fixed
What do we model? (small analogy) Recap ◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows power-law ◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is fixed Empirical Theoretical Barabasi-Albert, Watts- Strogatz, Bianconi- Random Power- Data from social Law Graph Barabasi (and many networks others) Random Power- Data from social ? Law Cascade networks
What do we model? (small analogy) Recap ◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows power-law ◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is fixed Empirical Theoretical Barabasi-Albert, Watts- Strogatz, Bianconi- Random Power- Data from social Law Graph Barabasi (and many networks others) Random Power- Data from social This paper Law Cascade networks
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise ◮ Remove Red edges
Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise ◮ Remove Red edges ◮ Generate Cascade using CGM (Leskovec et al. 2007)
Model (this paper) Figure: Sample result for a large n
Analysis Theorem (this paper) If you start Cascade Generation model (with probability α ) at a random node, then: 1 P [ | Informed | = k ] ∼ n ( 1 − β k ) ◮ n – number of nodes ◮ k – cascade size ◮ β is model parameter (usually close to 1 because 1 − β is the probability that there is an edge (parameter p ) and information will be transferred ( α ).
Takeaway Message The model obeys power-law when: ◮ Number of nodes is large ( n is large)
Takeaway Message The model obeys power-law when: ◮ Number of nodes is large ( n is large) ◮ Probability that a node will inform possibly unrelated node is small (1 − β is small)
Takeaway Message The model obeys power-law when: ◮ Number of nodes is large ( n is large) ◮ Probability that a node will inform possibly unrelated node is small (1 − β is small) ◮ Largest cascade k is small (e.g., in our data largest cascade had 70 000 nodes)
Approximation Approximation (this paper) When k ≪ 1 / ( 1 − β ) ≪ n Then the distribution of cascades follows Power-Law: P [ | Informed | = k ] ∼ k − γ
Approximation Approximation (this paper) When k ≪ 1 / ( 1 − β ) ≪ n Then the distribution of cascades follows Power-Law: P [ | Informed | = k ] ∼ k − γ Observations (2 weeks of publicly available Twitter data): ◮ k ≈ 7 · 10 4 ◮ 1 / ( 1 − β ) ≈ 5 · 10 6 ◮ n ≈ 3 · 10 8
Conclusion We propose the first model of generating cascades with theoretical guarantees (more guarantees in the paper). Thank You!
Question: Ground Truth ◮ Open sourced data from twitter ◮ Data were preprocessed and cleaned to get cascades ◮ New hashtags were treated as information cascades ◮ Connections through retweets and replies (and usage of hashtags) ◮ See paper (bit.ly/why-cascade) for tedious statistical analysis
Recommend
More recommend