SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity Qingyuan Zhao 1 , Murat A. Erdogdu 1 , Hera Y. He 1 , Anand Rajaraman 2 , Jure Leskovec 2 Department of Statistics 1 and Computer Science 2 , Stanford University KDD’15, Aug 12, 2015
Information cascade SEISMIC An information cascade occurs when people engage in the same actions. Background SEISMIC Experiments Summary Source: wikimedia.org Source: adweek.com 1/19
Twitter SEISMIC Twitter provides the ideal playground to study information cascades. Start: a Twitter user posts a 140-character message which can Background be seen by his/her followers. SEISMIC Spread: a tweet is forwarded in Twitter by another user. Experiments Summary 2/19
Predicting cascades in real time SEISMIC Background Goal SEISMIC Given the tweet and retweets up to time T , predict its final Experiments popularity. Summary 3/19
Predicting cascades in real time SEISMIC Background Goal SEISMIC Given the tweet and retweets up to time T , predict its final Experiments popularity. Summary Applications Ranking content. Detecting viral/breakout tweets. Understanding human social behavior. 3/19
Mathematical definitions SEISMIC Data Background Relative retweet time t 0 = 0 , t 1 , t 2 , . . . SEISMIC Experiments � Number of retweets by time t : R t = 1. Summary t i ≤ t Number of followers of each retweeter n 0 , n 1 , n 2 , . . . � Number of exposed users by time t : N t = n i . t i ≤ t 4/19
Mathematical definitions SEISMIC Data Background Relative retweet time t 0 = 0 , t 1 , t 2 , . . . SEISMIC Experiments � Number of retweets by time t : R t = 1. Summary t i ≤ t Number of followers of each retweeter n 0 , n 1 , n 2 , . . . � Number of exposed users by time t : N t = n i . t i ≤ t Problem statement Given ( R t , N t ) for 0 ≤ t ≤ T , predict R ∞ . 4/19
Approaches to cascade prediction SEISMIC Background Broadly categorized into two groups: SEISMIC Feature based methods (the majority): Experiments Summary Point process based methods: 5/19
Approaches to cascade prediction SEISMIC Background Broadly categorized into two groups: SEISMIC Feature based methods (the majority): Experiments Feature engineering: temporal, network structure, content, Summary user, . . . Point process based methods: 5/19
Approaches to cascade prediction SEISMIC Background Broadly categorized into two groups: SEISMIC Feature based methods (the majority): Experiments Feature engineering: temporal, network structure, content, Summary user, . . . Supervised learning: linear regression, collaborative filtering, regression trees, topic modeling, . . . Point process based methods: 5/19
Approaches to cascade prediction SEISMIC Background Broadly categorized into two groups: SEISMIC Feature based methods (the majority): Experiments Feature engineering: temporal, network structure, content, Summary user, . . . Supervised learning: linear regression, collaborative filtering, regression trees, topic modeling, . . . Point process based methods: Dynamic Poisson process, reinforced Poisson process 5/19
Approaches to cascade prediction SEISMIC Background Broadly categorized into two groups: SEISMIC Feature based methods (the majority): Experiments Feature engineering: temporal, network structure, content, Summary user, . . . Supervised learning: linear regression, collaborative filtering, regression trees, topic modeling, . . . Point process based methods: Dynamic Poisson process, reinforced Poisson process Our model (SEISMIC): self-exciting point process. 5/19
Example SEISMIC Background SEISMIC Experiments Summary 6/19
Example SEISMIC Histogram of Retweet Times Retweet Count 75 Background 50 SEISMIC Experiments 25 Summary 0 0 2 4 6 Prediction by SEISMIC 20000 15000 Retweets 10000 5000 0 0 2 4 6 Time since original tweet (hour) Final SEISMIC Cumulative 7/19
SEISMIC SEISMIC Background SEISMIC (Self-Exciting Model of Information Cascades) is a SEISMIC flexible model of information cascades. Experiments Summary Highlights Generative model. Easy interpretation. Scalable: prediction takes O (# retweets ). State-of-the-art performance. 8/19
Background: point processes SEISMIC Point process models Background P ( R t +∆ − R t = 1) SEISMIC R t is characterized by its intensity λ t = lim . Experiments ∆ ∆ ↓ 0 Summary 9/19
Background: point processes SEISMIC Point process models Background P ( R t +∆ − R t = 1) SEISMIC R t is characterized by its intensity λ t = lim . Experiments ∆ ∆ ↓ 0 Summary Examples Poisson process: λ t = λ ; Reinforced Poisson process 1 : λ t = p · φ ( t ) · g ( R t ). 1 S. Gao, J. Ma, and Z. Chen. Modeling and predicting retweeting dynamics on microblogging platforms. In WSDM ’15, 2015. 9/19
Background: point processes SEISMIC Point process models Background P ( R t +∆ − R t = 1) SEISMIC R t is characterized by its intensity λ t = lim . Experiments ∆ ∆ ↓ 0 Summary Examples Poisson process: λ t = λ ; Reinforced Poisson process 1 : λ t = p · φ ( t ) · g ( R t ). They are not suitable to model viral tweets. 1 S. Gao, J. Ma, and Z. Chen. Modeling and predicting retweeting dynamics on microblogging platforms. In WSDM ’15, 2015. 9/19
SEISMIC SEISMIC Key steps of retweeting How often does a user check Twitter? Background SEISMIC What is the user’s probability of retweeting a given tweet? Experiments Summary 10/19
SEISMIC SEISMIC Key steps of retweeting How often does a user check Twitter? Background Memory kernel (power law distribution). SEISMIC What is the user’s probability of retweeting a given tweet? Experiments Summary 10/19
SEISMIC SEISMIC Key steps of retweeting How often does a user check Twitter? Background Memory kernel (power law distribution). SEISMIC What is the user’s probability of retweeting a given tweet? Experiments Tweet infectiousness. Summary 10/19
SEISMIC SEISMIC Key steps of retweeting How often does a user check Twitter? Background Memory kernel (power law distribution). SEISMIC What is the user’s probability of retweeting a given tweet? Experiments Tweet infectiousness. Summary Self-exciting point process Infectiousness: “probability” of retweeting � λ t = p · n i φ ( t − t i ) , t ≥ t 0 . t i ≤ t Self-exciting: “rate” of viewing 10/19
Time-varying infectiousness SEISMIC Fixed p is not enough to model viral tweets. Background Histogram of Retweet Times SEISMIC Retweet Count 75 Experiments 50 Summary 25 0 0 2 4 6 Infectiousness Estimated by SEISMIC 0.06 Infectiousness 0.04 0.02 0.00 0 2 4 6 SEISMIC replaces p by a smooth process p t . 11/19
Estimate infectiousness SEISMIC We estimate p t by locally smoothing the maximum likelihood estimator (MLE): Background “Number of retweets” SEISMIC Experiments R t Summary � K t ( t − t i ) i =1 ˆ p t = . � t R t � n i K t ( t − s ) φ ( s − t i ) ds t i i =0 “Number of views” 12/19
Predict popularity SEISMIC SEISMIC prediction formula Background Assume the out-degrees in the network have mean n ∗ and the SEISMIC infectiousness parameter p t ≡ p for t ≥ T . Then Experiments R T + p ( N T − N e T ) Summary , if p < 1 , 1 − pn ∗ n ∗ E [ R ∞ | F T ] = if p ≥ 1 ∞ , . n ∗ R T � T where N e � T = n i φ ( t − t i ) dt . t i i =0 See our paper for derivation. 13/19
Example SEISMIC Histogram of Retweet Times Retweet Count 75 Background 50 SEISMIC Experiments 25 Summary 0 0 2 4 6 Prediction by SEISMIC 20000 15000 Retweets 10000 5000 0 0 2 4 6 Time since original tweet (hour) Final SEISMIC Cumulative 14/19
Experiments: dataset SEISMIC Background Raw dataset: all tweet and retweet activities SEISMIC from October 7 to November 7, 2011. Experiments Filter by: Summary Posted in the first 15 days. English tweets; No hashtag; At least 50 retweets; End up with 166076 cascades (in total over 34 million tweets/retweets). 15/19
Baselines SEISMIC Background SEISMIC We compare SEISMIC to four different baselines: Experiments 1 LR: linear regression Summary 2 LR-D: linear regression with degree 3 DPM: dynamic Poisson model 4 RPS: reinforced Poisson model 16/19
Comparison: Absolute Percentage Error (APE) SEISMIC APE = | ˆ R ∞ − R ∞ | / R ∞ . Background SEISMIC Experiments Summary 15% vs 25% percentage error when observe 1 hour. 17/19
Comparison: Coverage of breakouts SEISMIC A list of true top 500 tweets with most retweets. Lists of predicted top 500 tweets at all time points. Background SEISMIC Experiments Summary 70% vs 55% coverage when observe 25% retweets. 18/19
Summary SEISMIC Background In conclusion, SEISMIC SEISMIC Effectively models information cascades by self-exciting Experiments Summary point processes; Efficiently updates parameters and makes prediction; Outperforms several baselines and state-of-the-art. Code and data available online at http://snap.stanford.edu/seismic . 19/19
Estimation of memory kernel φ ( t ) SEISMIC Background SEISMIC Experiments Summary 19/19
Recommend
More recommend