why do cascade sizes follow a power law
play

Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr - PowerPoint PPT Presentation

Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr Sankowski, Karol Wgrzycki , Piotr Wygocki University of Warsaw Paper: bit.ly/why-cascades WWW 2017 Information Cascade Cascades as Graphs Given a Social Network The process


  1. Why Do Cascade Sizes Follow a Power-Law? Andrzej Pacuk, Piotr Sankowski, Karol Węgrzycki , Piotr Wygocki University of Warsaw Paper: bit.ly/why-cascades WWW 2017

  2. Information Cascade

  3. Cascades as Graphs ◮ Given a Social Network ◮ The process of spreading the information generates a graph (a DAG) (a) Social Network (b) Cascade

  4. Cascade as Graphs Cascade ⇐ ⇒ Propagation Graph

  5. Cascade as Graphs Cascade ⇐ ⇒ Propagation Graph Cascade size ⇐ ⇒ Rumour Popularity

  6. Cascade Generation Model (Leskovec et al. 2007) a a newly informed = { a } informed = {} c b e d g f

  7. Cascade Generation Model (Leskovec et al. 2007) a a newly informed = { a , b } informed = {} c b b e d g f

  8. Cascade Generation Model (Leskovec et al. 2007) a a newly informed = { a , b , d } informed = {} c b b e d d g f

  9. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d } informed = { a } c b b b e d d d g f

  10. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d , c } informed = { a } c c b b b e d d d g f

  11. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d , c } informed = { a } c c b b b e e d d d g f

  12. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { b , d , c , f } informed = { a } c c b b b e e d d d g f f

  13. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { c , f } informed = { a , b , d } c c b b b b e e d d d d g f f

  14. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { c , f } informed = { a , b , d } c c c b b b b e e d d d d g f f f

  15. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { c , f , g } informed = { a , b , d } c c c b b b b e e d d d d g g f f f

  16. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = { g } informed = { a , b , d , c , f } c c c c b b b b e e d d d d g g g f f f f

  17. Cascade Generation Model (Leskovec et al. 2007) a a a newly informed = {} informed = { a , b , d , c , f , g } c c c c b b b b e e d d d d g g g g f f f f

  18. Cascade Generation Model (Leskovec et al. 2007) Successful predicts small cascade sizes ◮ Over 650 citation ◮ Lots of practical applications

  19. Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014])

  20. Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014]) ◮ Log-log plots

  21. Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014]) ◮ Log-log plots ◮ X-axis - cascade size

  22. Problem Real vs. simulated cascade size distribution (similar observations in, e.g., Cui et al. [CIKM 2014]) ◮ Log-log plots ◮ X-axis - cascade size ◮ Y-axis - probability of occurrence of cascade of such size

  23. Problem Possible explanations why this problem occurs:

  24. Problem Possible explanations why this problem occurs: ◮ Large number of Strongly Connected Components in a graph (Lee et al. PAKDD 2014) ◮ Burstiness of human behaviour (Mathews et al. WWW 2017) ◮ Time and space effects (Cui et al. CIKM 2014) ◮ Preferential attachment ◮ Many others!

  25. Problem Possible explanations why this problem occurs: ◮ Large number of Strongly Connected Components in a graph (Lee et al. PAKDD 2014) ◮ Burstiness of human behaviour (Mathews et al. WWW 2017) ◮ Time and space effects (Cui et al. CIKM 2014) ◮ Preferential attachment ◮ Many others! Only empirical you need a social network to generate a random cascade

  26. What do we model? ◮ We propose model that outputs the probability of occurrence of the cascade.

  27. What do we model? ◮ We propose model that outputs the probability of occurrence of the cascade. ◮ For example, first with 2%, second with 1%, etc.

  28. What do we model? ◮ We propose model that outputs the probability of occurrence of the cascade. ◮ For example, first with 2%, second with 1%, etc. ◮ And parameters to adjust to the social network

  29. What do we model? (small analogy) Recap ◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows power-law ◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is fixed

  30. What do we model? (small analogy) Recap ◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows power-law ◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is fixed Empirical Theoretical Barabasi-Albert, Watts- Strogatz, Bianconi- Random Power- Data from social Law Graph Barabasi (and many networks others) Random Power- Data from social ? Law Cascade networks

  31. What do we model? (small analogy) Recap ◮ Power-Law Graphs – number of nodes fixed. Degree distribution follows power-law ◮ Power-Law Cascade – number of nodes follows power-law. Degree distribution is fixed Empirical Theoretical Barabasi-Albert, Watts- Strogatz, Bianconi- Random Power- Data from social Law Graph Barabasi (and many networks others) Random Power- Data from social This paper Law Cascade networks

  32. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes

  33. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node

  34. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p

  35. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p

  36. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  37. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  38. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  39. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  40. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  41. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  42. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  43. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise

  44. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise ◮ Remove Red edges

  45. Model (this paper) 1 2 3 4 5 ◮ Start with n ordered nodes ◮ Yellow – selected node ◮ Black edge with probabiltiy p ◮ Red edge otherwise ◮ Remove Red edges ◮ Generate Cascade using CGM (Leskovec et al. 2007)

  46. Model (this paper) Figure: Sample result for a large n

  47. Analysis Theorem (this paper) If you start Cascade Generation model (with probability α ) at a random node, then: 1 P [ | Informed | = k ] ∼ n ( 1 − β k ) ◮ n – number of nodes ◮ k – cascade size ◮ β is model parameter (usually close to 1 because 1 − β is the probability that there is an edge (parameter p ) and information will be transferred ( α ).

  48. Takeaway Message The model obeys power-law when: ◮ Number of nodes is large ( n is large)

  49. Takeaway Message The model obeys power-law when: ◮ Number of nodes is large ( n is large) ◮ Probability that a node will inform possibly unrelated node is small (1 − β is small)

  50. Takeaway Message The model obeys power-law when: ◮ Number of nodes is large ( n is large) ◮ Probability that a node will inform possibly unrelated node is small (1 − β is small) ◮ Largest cascade k is small (e.g., in our data largest cascade had 70 000 nodes)

  51. Approximation Approximation (this paper) When k ≪ 1 / ( 1 − β ) ≪ n Then the distribution of cascades follows Power-Law: P [ | Informed | = k ] ∼ k − γ

  52. Approximation Approximation (this paper) When k ≪ 1 / ( 1 − β ) ≪ n Then the distribution of cascades follows Power-Law: P [ | Informed | = k ] ∼ k − γ Observations (2 weeks of publicly available Twitter data): ◮ k ≈ 7 · 10 4 ◮ 1 / ( 1 − β ) ≈ 5 · 10 6 ◮ n ≈ 3 · 10 8

  53. Conclusion We propose the first model of generating cascades with theoretical guarantees (more guarantees in the paper). Thank You!

  54. Question: Ground Truth ◮ Open sourced data from twitter ◮ Data were preprocessed and cleaned to get cascades ◮ New hashtags were treated as information cascades ◮ Connections through retweets and replies (and usage of hashtags) ◮ See paper (bit.ly/why-cascade) for tedious statistical analysis

Recommend


More recommend