modelling cascades over time in microblogs
play

Modelling Cascades Over Time in Microblogs Wei Xie , Feida Zhu, - PowerPoint PPT Presentation

Modelling Cascades Over Time in Microblogs Wei Xie , Feida Zhu, Siyuan Liu and Ke Wang* Living Analytics Research Centre Singapore Management University * Ke Wang is from Simon Fraser University, and this work was done when the author was


  1. Modelling Cascades Over Time in Microblogs Wei Xie , Feida Zhu, Siyuan Liu and Ke Wang* Living Analytics Research Centre Singapore Management University * Ke Wang is from Simon Fraser University, and this work was done when the author was visiting Living Analytics Research Centre in Singapore Management University.

  2. Motivation • Business applications such as viral marketing have driven a lot of research effort predicting whether a cascade will go viral. • In real life, there are very few truly viral cascades. • Previous research work* shows that temporal features are the key predictor of cascade size. * Justin Cheng, Lada A. Adamic, P. Alex Dow, Jon M. Kleinberg, Jure Leskovec: Can cascades be predicted? WWW 2014: 925-936

  3. Time-aware Cascade Model u 5 u 5 u 4 u 4 u 0 u 0 t 4 u 1 u 1 t 0 t 0 t 1 t 1 u 3 u 3 u 2 u 2 t 3 t 3 t 2 t 2 t t + dt

  4. Time-aware Cascade Model u 5 u 5 u 4 u 4 u 0 u 0 t 4 u 1 u 1 t 0 t 0 t 1 t 1 u 3 u 3 u 2 u 2 t 3 t 3 t 2 t 2 t t + dt

  5. Time-aware Cascade Model u 5 u 5 u 4 u 4 u 0 u 0 t 4 u 1 u 1 t 0 t 0 t 1 t 1 u 3 u 3 u 2 u 2 t 3 t 3 t 2 t 2 t t + dt ( t ) = ( t , { t j } ; Θ ) ⋅ dt P i h i e ( i ) ( t ) u j ∈ Followe ⎧ ⎪ P ( C ( t + dt )) = P ( C ( t + dt )| C ( t )) ⋅ P ( C ( t )) ⎪ ⎪ ⎪ P ( C ( )) = 1 t 0 ⎨ ⎪ ∏ ∏ ⎪ P ( C ( t + dt )| C ( t )) = ( t ) ⋅ (1 − ( t )) P i P i ′ ⎪ ⎩ ⎪ u i X (1) X (2) ( t ) ( t ) u i ′ ∈ ∈

  6. Time-aware Cascade Model u 5 u 5 u 4 u 4 u 0 u 0 t 4 u 1 u 1 t 0 t 0 t 1 t 1 u 3 u 3 u 2 u 2 t 3 t 3 t 2 t 2 t t + dt ( t ) = ( t , { t j } ; Θ ) ⋅ dt P i h i e ( i ) ( t ) u j ∈ Followe ⎧ ⎪ P ( C ( t + dt )) = P ( C ( t + dt )| C ( t )) ⋅ P ( C ( t )) ⎪ ⎪ ⎪ P ( C ( )) = 1 t 0 ⎨ ⎪ ∏ ∏ ⎪ P ( C ( t + dt )| C ( t )) = ( t ) ⋅ (1 − ( t )) P i P i ′ ⎪ ⎩ ⎪ u i X (1) X (2) ( t ) ( t ) u i ′ ∈ ∈ users who have re-shared

  7. Time-aware Cascade Model u 5 u 5 u 4 u 4 u 0 u 0 t 4 u 1 u 1 t 0 t 0 t 1 t 1 u 3 u 3 u 2 u 2 t 3 t 3 t 2 t 2 t t + dt ( t ) = ( t , { t j } ; Θ ) ⋅ dt P i h i e ( i ) ( t ) u j ∈ Followe ⎧ ⎪ P ( C ( t + dt )) = P ( C ( t + dt )| C ( t )) ⋅ P ( C ( t )) ⎪ ⎪ ⎪ P ( C ( )) = 1 t 0 ⎨ ⎪ ∏ ∏ ⎪ P ( C ( t + dt )| C ( t )) = ( t ) ⋅ (1 − ( t )) P i P i ′ ⎪ ⎩ ⎪ u i X (1) X (2) ( t ) ( t ) u i ′ ∈ ∈ users who have re-shared users who haven’t yet

  8. Time-aware Cascade Model u 5 u 5 u 4 u 4 u 0 u 0 t 4 u 1 u 1 t 0 t 0 t 1 t 1 u 3 u 3 u 2 u 2 t 3 t 3 t 2 t 2 t t + dt ( t ) = ( t , { t j } ; Θ ) ⋅ dt P i h i e ( i ) ( t ) u j ∈ Followe ⎧ ⎪ P ( C ( t + dt )) = P ( C ( t + dt )| C ( t )) ⋅ P ( C ( t )) ⎪ ⎪ ⎪ P ( C ( )) = 1 t 0 ⎨ ⎪ ∏ ∏ ⎪ P ( C ( t + dt )| C ( t )) = ( t ) ⋅ (1 − ( t )) P i P i ′ ⎪ ⎩ ⎪ u i X (1) X (2) ( t ) ( t ) u i ′ ∈ ∈ users who have re-shared users who haven’t yet

  9. Observations in Twitter Observation 1. Only the first re-sharer matters. ( t ) = ( t , ; Θ ) ⋅ dt P i h i t j ⋆ where e ( i ) j ⋆ = argmi { | ( t )} n j t j u j ∈ Followe

  10. Observations in Twitter Observation 1. Only the first re-sharer matters. ( t ) = ( t , ; Θ ) ⋅ dt P i h i t j ⋆ where e ( i ) j ⋆ = argmi { | ( t )} n j t j u j ∈ Followe Observation 2. The chance of a tweet to be retweeted decreases as time goes by. ( t ) = ( τ ; Θ ) ⋅ dt P i h i where and is a decreasing function . ( τ ) t j ⋆ h i τ = t −

  11. Hazard Function Design P ( t < T ≤ t + dt | T > t ) f ( t ) h ( t ) = lim = 1 − F ( t ) dt dt → 0

  12. Hazard Function Design P ( t < T ≤ t + dt | T > t ) f ( t ) h ( t ) = lim = 1 − F ( t ) dt dt → 0 t t F ′ ( u ) ∫ ∫ | t H ( t ) = h ( u )d u = d u = − log (1 − F ( u )) = − log (1 − F ( t )) 0 1 − F ( u ) 0 0

  13. Hazard Function Design P ( t < T ≤ t + dt | T > t ) f ( t ) h ( t ) = lim = 1 − F ( t ) dt dt → 0 t t F ′ ( u ) ∫ ∫ | t H ( t ) = h ( u )d u = d u = − log (1 − F ( u )) = − log (1 − F ( t )) 0 1 − F ( u ) 0 0 e − H ( t ) F ( t ) = 1 −

  14. Hazard Function Design P ( t < T ≤ t + dt | T > t ) f ( t ) h ( t ) = lim = 1 − F ( t ) dt dt → 0 t t F ′ ( u ) ∫ ∫ | t H ( t ) = h ( u )d u = d u = − log (1 − F ( u )) = − log (1 − F ( t )) 0 1 − F ( u ) 0 0 e − H ( t ) F ( t ) = 1 − t e − t H ( t ) = ⇒ F ( t ) = 1 − Exponential distribution λ λ

  15. Hazard Function Design P ( t < T ≤ t + dt | T > t ) f ( t ) h ( t ) = lim = 1 − F ( t ) dt dt → 0 t t F ′ ( u ) ∫ ∫ | t H ( t ) = h ( u )d u = d u = − log (1 − F ( u )) = − log (1 − F ( t )) 0 1 − F ( u ) 0 0 e − H ( t ) F ( t ) = 1 − t e − t H ( t ) = ⇒ F ( t ) = 1 − Exponential distribution λ λ t α ) β e − ( t α ) β H ( t ) = ( ⇒ Weibull distribution F ( t ) = 1 −

  16. Hazard Function Design t e − t H ( t ) = ⇒ F ( t ) = 1 − Exponential distribution λ λ t α ) β e − ( t α ) β H ( t ) = ( ⇒ Weibull distribution F ( t ) = 1 −

  17. Hazard Function Design t e − t H ( t ) = ⇒ F ( t ) = 1 − Exponential distribution λ λ t α ) β e − ( t α ) β H ( t ) = ( ⇒ Weibull distribution F ( t ) = 1 − e −∞ H ( ∞ ) = ∞ ⇒ F ( ∞ ) = 1 − ⇒ F ( ∞ ) = 1

  18. Hazard Function Design t e − t H ( t ) = ⇒ F ( t ) = 1 − Exponential distribution λ λ t α ) β e − ( t α ) β H ( t ) = ( ⇒ Weibull distribution F ( t ) = 1 − e −∞ H ( ∞ ) = ∞ ⇒ F ( ∞ ) = 1 − ⇒ F ( ∞ ) = 1

  19. Hazard Function Design t e − t H ( t ) = ⇒ F ( t ) = 1 − Exponential distribution λ λ t α ) β e − ( t α ) β H ( t ) = ( ⇒ Weibull distribution F ( t ) = 1 − e −∞ H ( ∞ ) = ∞ ⇒ F ( ∞ ) = 1 − ⇒ F ( ∞ ) = 1

  20. Hazard Function Design τ ) − β H ( τ ) = λ ⋅ (1 − ( + 1 ) α d H ( τ ) β τ ) − ( β +1) h ( τ ) = = λ ⋅ ⋅ ( + 1 d τ α α

  21. Hazard Function Design τ ) − β H ( τ ) = λ ⋅ (1 − ( + 1 ) α

  22. Hazard Function Design τ ) − β H ( τ ) = λ ⋅ (1 − ( + 1 ) α scale parameter

  23. Hazard Function Design τ ) − β H ( τ ) = λ ⋅ (1 − ( + 1 ) α shape parameter scale parameter

  24. Hazard Function Design τ ) − β H ( τ ) = λ ⋅ (1 − ( + 1 ) α shape parameter scale parameter F ( ∞ ) ≈ H ( ∞ ) = λ

  25. Hazard Function Design τ ) − β H ( τ ) = λ ⋅ (1 − ( + 1 ) α shape parameter scale parameter F ( ∞ ) ≈ H ( ∞ ) = λ describes the eventual re-tweeting probability

  26. Hazard Rate Illustration

  27. Hazard Rate Illustration 20 16 Retweeting Rate 12 8 4 0 t C 60 Time (Minute)

  28. Hazard Rate Illustration 20 16 Retweeting Rate 12 8 4 0 t C 60 Time (Minute) 16e-4 Emperical Rate Estimated Rate 12e-4 Hazard Rate 8e-4 4e-4 0 0 10 20 30 40 50 60 Time (Minute)

  29. Dataset From a Singapore based Twitter data set, we get all the retweets to construct retweeting cascades. In all we get 2,425,348 cascades.

  30. 
 Probabilistic Model Fitting • TM t Threshold Model 
 e ( i ) ( t ) = λ ⋅ s (| Followe ( t )|) h i 1 where s ( x ) = 1 + e − a ( x − b ) • TCM-CH Constant Hazard 
 d H ( τ ) H ( τ ) = λ ⋅ τ h ( τ ) = = λ d τ • TCM-EH Exponential Hazard 
 d H ( τ ) e − k ⋅ τ e − k ⋅ τ H ( τ ) = λ ⋅ (1 − ) h ( τ ) = = λ ⋅ k ⋅ d τ • TCM-LH Long tail Hazard (our proposed) d H ( τ ) τ β τ ) − ( β +1) ) − β H ( τ ) = λ ⋅ (1 − ( + 1 ) h ( τ ) = = λ ⋅ ⋅ ( + 1 d τ α α α

  31. Probabilistic Model Fitting For each cascade, observe its development in first for 
 T 0 training, and the next for testing. ∆ T

  32. Probabilistic Model Fitting

  33. Predicting Cascade Growth

  34. Virality Prediction

  35. Thanks

  36. Our work is based on previous cascade models • J. Goldenberg, B. Libai, and E. Muller. Talk of the network: A complex systems look at the underlying process of word-of- mouth. Marketing letters, 12(3):211–223, 2001. 
 • M.Gomez-Rodriguez,D.Balduzzi,andB.Scho ̈ lkopf.Uncovering the temporal dynamics of diffusion networks. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 561–568, 2011. 
 • S. A. Myers, C. Zhu, and J. Leskovec. Information diffusion and external influence in networks. In The 18th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, KDD ’12, Beijing, China, August 12-16, 2012, pages 33–41, 2012. 
 • M. Gomez-Rodriguez, J. Leskovec, and B. Scho ̈ lkopf. Modeling information propagation with survival theory. In ICML (3), pages 666–674, 2013. 
 • N. Du, L. Song, M. Gomez-Rodriguez, and H. Zha. Scalable influence estimation in continuous- time diffusion networks. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., pages 3147–3155, 2013.

Recommend


More recommend