 
              CSE 6240: Web Search and Text Mining. Spring 2020 Cascades and Contagion Prof. Srijan Kumar http://cc.gatech.edu/~srijan 1 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Today’s Lecture • Introduction • Decision based models of diffusion – Single Adoption – Multiple Adoption • Probabilistic models of diffusion – SEIR model – Independent cascade model These slides are borrowed from Prof. Jure Leskovec’s CS224W class. 2 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Epidemics vs Cascade Spreading • In decision-based models nodes make decisions based on pay-off benefits of adopting one strategy or the other. • In epidemic spreading: – Lack of decision making – Process of contagion is complex and unobservable • In some cases it involves (or can be modeled as) randomness 3 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Simple model: Branching Process • First wave : A person carrying a disease enters the population and transmits to all she meets with probability 𝑟 . She meets 𝑒 people, a portion of which will be infected. • Second wave : Each of the 𝑒 people goes and meets 𝑒 different people. So we have a second wave of 𝑒 ∗ 𝑒 = 𝑒 % people, a portion of which will be infected. • Subsequent waves : same process 4 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Example with k=3 5 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Spreading Models of Viruses Virus Propagation: 2 Parameters: • (Virus) Birth rate β : – probability that an infected neighbor attacks • (Virus) Death rate δ : – Probability that an infected node heals Healthy Prob. δ N 2 Prob. β N 1 N Infected N 3 6 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
More Generally: S+E+I+R Models • General scheme for epidemic models: – Each node can go through phases: • Transition probs. are governed by the model parameters S…susceptible E…exposed I…infected R…recovered Z…immune 7 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
SIR Model • SIR model: Node goes through phases 𝜀 𝛾 S usceptible I nfected R ecovered – Models chickenpox or plague: • Once you heal, you can never get infected again • Assuming perfect mixing: The network is a complete graph S(t) • The model dynamics are: R(t) dS Number of nodes dR dt = − β SI dt = δ I I(t) dI dt = β SI − δ I time 8 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
SIS Model • Susceptible-Infective-Susceptible (SIS) model • Cured nodes immediately become susceptible • Virus “strength”: 𝒕 = 𝜸 / 𝜺 • Node state transition diagram: Infected by neighbor with prob. β Susceptible Infective Cured with prob. δ 9 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
SIS Model • Models flu: – Susceptible node I(t) becomes infected – The node then Number of nodes heals and become susceptible again • Assuming perfect mixing (a S(t) complete graph): dS = - b + d SI I dt time dI S usceptible I nfected = b - d SI I dt 10 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Question: Epidemic threshold 𝝊 • SIS Model: Epidemic threshold of an arbitrary graph G is τ , such that: – If virus “strength” s = β / δ < τ the epidemic can not happen (it eventually dies out) • Given a graph what is its epidemic threshold? 11 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
[Wang et al. 2003] Epidemic Threshold in SIS Model • Fact: We have no epidemic if: Epidemic threshold (Virus) Death rate β / δ < τ = 1/ λ 1, A largest eigenvalue (Virus) Birth rate of adj. matrix A of G ► λ 1, A alone captures the property of the graph! 12 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
[Wang et al. 2003] Experiments on an Small Graph Autonomous Systems Graph 10,900 nodes and 500 Oregon 31,180 edges β = 0.001 Number of Infected Nodes s= β / δ > τ 400 (above threshold) 300 200 s= β / δ = τ 100 (at the threshold) 0 s= β / δ < τ 0 250 500 750 1000 (below threshold) Time δ : 0.05 0.06 0.07 13 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Experiments • Does it matter how many people are initially infected? 14 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
[Gomes et al., 2014] Modeling Ebola with SEIR [Gomes et al., Assessing the International Spreading Risk Associated with the 2014 West African Ebola Outbreak, PLOS Current Outbreaks , ‘14] 15 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Example: Ebola S: susceptible individuals, E: exposed individuals, I: infectious cases in the community, H: hospitalized cases, F: dead but not yet buried, R: individuals no longer transmitting the disease [Gomes et al., Assessing the International Spreading Risk Associated with the 2014 West African Ebola Outbreak, PLOS Current Outbreaks , ‘14] 16 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Application: Rumor spread modeling using SEIZ model References: 1. Epidemiological Modeling of News and Rumors on Twitter. Jin et al. SNAKDD 2013 2. False Information on Web and Social Media: A survey. Kumar et al., arXiv :1804.08559 17 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
SEIZ model: Extension of SIS model 18 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Recap: SIS model 19 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Details of the SEIZ model Notation: – S = Susceptible – I = Infected – E = Exposed – Z = Skeptics 20 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Dataset Tweets collected from eight stories: Four rumors and four real REAL EVENTS RUMORS 21 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Method: Fitting SEIZ model to data • SEIZ model is fit to each cascade to minimize the difference |𝐽(𝑢) – 𝑢𝑥𝑓𝑓𝑢𝑡(𝑢)| : – 𝑢𝑥𝑓𝑓𝑢𝑡(𝑢) = number of rumor tweets – 𝐽(𝑢) = the estimated number of rumor tweets by the model • Use grid-search and find the parameters with minimum error 22 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Fitting to “Boston Marathon Bombing” SEIZ model better models the real data, especially at initial points 23 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Fitting to "Pope resignation” data SEIZ model better models the real data, especially at initial points 24 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Rumor detection with SEIZ model Notation: S = Susceptible I = Infected E = Exposed Z = Skeptics All parameters learned by model New fitting to real data (from previous slides) metric: 25 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Rumor detection by R SI Rumors Parameters obtained by fitting SEIZ model efficiently identifies rumors vs. news 26 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Today’s Lecture • Introduction • Decision based models of diffusion – Single Adoption – Multiple Adoption • Probabilistic models of diffusion – SEIZ model – Independent cascade model 27 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Linear Threshold Model • A decision-based model • A node v has random threshold 𝜄 𝑤 ~ U[0,1] • A node v is influenced by each neighbor w according to a weight 𝑐 𝑤,𝑥 such that å £ b 1 v w , w neighbor of v • A node v becomes active when >= (weighted) 𝜾 𝒘 fraction of its neighbors are å active ³ q b v w , v w active neighbor of v 28 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Linear Threshold Model Inactive Node 0.6 Active Node Threshold 0.2 0.2 0.3 Active neighbors X 0.1 0.4 U 0.3 0.5 Stop! 0.2 0.5 w v 29 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Probabilistic Contagion • Independent Cascade Model – Directed finite 𝑯 = (𝑾, 𝑭) – Set 𝑻 starts out with new behavior • Say nodes with this behavior are “ active ” – Each edge (𝒘, 𝒙) has a probability 𝒒 𝒘𝒙 – If node 𝒘 is active, it gets one chance to make 𝒙 active, with probability 𝒒 𝒘𝒙 • Each edge fires at most once • Does scheduling matter? No • If 𝒗, 𝒘 are both active at the same time, it doesn’t matter which tries to activate 𝒙 first – But the time moves in discrete steps 30 Srijan Kumar, Georgia Tech, CSE6240 Spring 2020: Web Search and Text Mining
Recommend
More recommend