Comparing Compartment and Agent-based Models Shannon Gallagher JSM Baltimore, MD August 2, 2017 Thesis work with: William F. Eddy (Chair) Joel Greenhouse Howard Seltman Cosma Shalizi Samuel L. Ventura
Goal: Combine two good models into a better one 1
Studying infectious disease is important 2
Compartment vs. Agent-based Models
2. Law of mass action I t 1 I t Compartment models (CMs) describe how individuals evolve over time Assumptions (Anderson and May 1992) : 1. Homogeneity of individuals 4
Compartment models (CMs) describe how individuals evolve over time Assumptions (Anderson and May 1992) : 1. Homogeneity of individuals 2. Law of mass action I ( t + 1 ) ∝ I ( t ) 4
2. Model adequately reflects reality Agent-based models (AMs) simulate the spread of disease Assumptions (Helbing 2002) : 1. Heterogeneity of agents 5
Agent-based models (AMs) simulate the spread of disease Assumptions (Helbing 2002) : 1. Heterogeneity of agents 2. Model adequately reflects reality 5
CMs and AMs: a side by side comparison CMs AMs ∙ Equation-based ∙ Simulation-based ∙ Computationally fast ∙ Computationally slow ∙ Homogeneous individuals ∙ Heterogeneous individuals ∙ No individual properties ∙ Individual properties 6
Goal: Create a statistically justified hybrid model Combining the two together (Bobashev 2007, Banos 2015, Wallentin 2017) ∙ ad hoc approaches ∙ perspective from non-statisticians 7
Combining the two together (Bobashev 2007, Banos 2015, Wallentin 2017) ∙ ad hoc approaches ∙ perspective from non-statisticians Goal: Create a statistically justified hybrid model 7
Current Work
There are two main avenues of improvement 1. Quantifying how similar CMs and AMs are 2. Speeding up AM run-time 9
The SIR model: a detailed look (Kermack and McKendrick 1927) = − β SI dS dt N = β SI dI N − γ I dt dR = γ I dt ∙ β – rate of infection ∙ γ – rate of recovery ∙ N – total population size 10
The SIR model: a detailed look (Kermack and McKendrick 1927) = − β SI ∆ S ∆ t N = β SI ∆ I N − γ I ∆ t ∆ R = γ I ∆ t ∙ β – rate of infection ∙ γ – rate of recovery ∙ N – total population size 11
Our stochastic CM approach S ( t + 1 ) = ˆ ˆ S ( t ) − s t ˆ R ( t + 1 ) = ˆ R ( t ) + r t I ( t + 1 ) = N − ˆ S ( t + 1 ) − ˆ R ( t + 1 ) , ˆ with S ( t ) , β I ( t ) ( ) s t + 1 ∼ Binomial ˆ N ( ) r t + 1 ∼ Binomial ˆ I ( t ) , γ . 12
Our stochastic AM approach For an agent x n ( t ) , n = 1 , 2 , . . . , N, the forward operator for t > 0 is ( β I ( t ) ) x n ( t ) + Bernoulli if x n ( t ) = 1 N x n ( t + 1 ) = x n ( t ) + Bernoulli ( γ ) if x n ( t ) = 2 . x n ( t ) otherwise where x n ( t ) = k, k ∈ { 1 , 2 , 3 } corresponds to state S, I, and R, respectively Let the aggregate total in each compartment be N ˆ X k ( t ) = ∑ I{ x n ( t ) = k } n = 1 13
The means overlap Mean Proportion of Compartment Values 1000 agents; 5000 runs; β = 0.10; γ = 0.03 100 Type S−CM 75 % of Population I−CM R−CM 50 S−AM I−AM 25 R−AM 0 0 25 50 75 100 Time 14
The distributions look the same 15
These approaches are equivalent Theorem Let the CM and AM be as previously described. Then for all t ∈ { 1 , 2 , . . . , T } , d S ( t ) X S ( t ) (1) ˆ = ˆ d I ( t ) X I ( t ) ˆ = ˆ d R ( t ) X R ( t ) . ˆ = ˆ 16
These approaches are equivalent Theorem Let the CM and AM be as previously described. Then for all t ∈ { 1 , 2 , . . . , T } , d S ( t ) X S ( t ) (1) ˆ = ˆ d I ( t ) X I ( t ) ˆ = ˆ d R ( t ) X R ( t ) . ˆ = ˆ 16
We can compare CM/AM pairs and AM/AM pairs by fitting the underlying model Fitted SIR parameters distribution 1000 agents; 5000 runs; β = 0.10; γ = 0.03 0.031 Simulation Type 0.030 CM γ AM 0.029 0.028 0.096 0.098 0.100 0.102 β 17
Goal: Improve computation time without sacrificing statistical details AMs are appealing because they can be run multiple times ∙ Simulate an epidemic en masse! ∙ A run - same initial parameters, different random numbers ∙ Runs (L) are independent of one another = ⇒ parallelization ∙ Roughly, the variance of compartments ↓ when N , L ↑ 18
AMs are appealing because they can be run multiple times ∙ Simulate an epidemic en masse! ∙ A run - same initial parameters, different random numbers ∙ Runs (L) are independent of one another = ⇒ parallelization ∙ Roughly, the variance of compartments ↓ when N , L ↑ Goal: Improve computation time without sacrificing statistical details 18
There is a tradeoff between the number of agents and number of runs Ratio of Variance of # Susceptibles Ratio of Variance of # Infected Ratio of Variance of # Recovered 5000 runs; β = 0.10; γ =0.03; Model 1−1000 agents, Model 2−100 agents 5000 runs; β = 0.10; γ =0.03; Model 1−1000 agents, Model 2−100 agents 5000 runs; β = 0.10; γ =0.03; Model 1−1000 agents, Model 2−100 agents 14 14 14 12 12 12 V(R 1 ) V(R 2 ) V(S 1 ) V(S 2 ) V(I 1 ) V(I 2 ) 10 10 10 8 8 8 6 6 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 Time Time Time 19
∙ V S t 1 S t 1 p t p t 1 p t 2 V S t N 2 ∙ V S 2 t N 1 V S 1 t S 1 t 1 V L 2 N 2 V S 1 t L 1 runs N 1 2 L 1 N 2 S 2 t V S 2 t 1 V 1 L 2 runs N 2 L 2 N 2 L 1 N 1 We can replace agents with runs! The calculations show that the variance scales ∙ Note that for a given β and γ , if S 1 ( 0 ) = S 2 ( 0 ) S 1 ( t ) = S 2 ( t ) = ⇒ N 1 N 2 N 1 N 2 20
N 2 ∙ V S 2 t N 1 V S 1 t S 1 t 1 V L 2 N 2 V S 1 t L 1 runs N 1 2 L 1 N 2 S 2 t V S 2 t 1 V 1 L 2 runs N 2 L 2 N 2 L 1 N 1 We can replace agents with runs! The calculations show that the variance scales ∙ Note that for a given β and γ , if S 1 ( 0 ) = S 2 ( 0 ) S 1 ( t ) = S 2 ( t ) = ⇒ N 1 N 2 N 1 N 2 [ ] [ ] ∙ V S ( t + 1 ) = S ( t )( 1 − p t ) p t + ( 1 − p t ) 2 V S ( t ) ˆ ˆ 20
S 1 t 1 V L 2 N 2 V S 1 t L 1 runs N 1 2 L 1 N 2 S 2 t V S 2 t 1 V 1 L 2 runs N 2 L 2 N 2 L 1 N 1 We can replace agents with runs! The calculations show that the variance scales ∙ Note that for a given β and γ , if S 1 ( 0 ) = S 2 ( 0 ) S 1 ( t ) = S 2 ( t ) = ⇒ N 1 N 2 N 1 N 2 [ ] [ ] ∙ V S ( t + 1 ) = S ( t )( 1 − p t ) p t + ( 1 − p t ) 2 V S ( t ) ˆ ˆ S 2 ( t )] = N 2 ∙ V [ˆ N 1 V [ˆ S 1 ( t )] 20
We can replace agents with runs! The calculations show that the variance scales ∙ Note that for a given β and γ , if S 1 ( 0 ) = S 2 ( 0 ) S 1 ( t ) = S 2 ( t ) = ⇒ N 1 N 2 N 1 N 2 [ ] [ ] ∙ V S ( t + 1 ) = S ( t )( 1 − p t ) p t + ( 1 − p t ) 2 V S ( t ) ˆ ˆ S 2 ( t )] = N 2 ∙ V [ˆ N 1 V [ˆ S 1 ( t )] S 1 ( t ) [ 1 ˆ ] V ∑ ] = L 2 N 2 · V [ˆ S 1 ( t )] L 1 runs ℓ N 1 2 L 1 N 2 ˆ S 2 ( t ) V [ˆ S 2 ( t )] [ 1 V ∑ 1 L 2 runs ℓ N 2 = L 2 N 2 . L 1 N 1 20
The calculations show that the variance scales ∙ Note that for a given β and γ , if S 1 ( 0 ) = S 2 ( 0 ) S 1 ( t ) = S 2 ( t ) = ⇒ N 1 N 2 N 1 N 2 [ ] [ ] ∙ V S ( t + 1 ) = S ( t )( 1 − p t ) p t + ( 1 − p t ) 2 V S ( t ) ˆ ˆ S 2 ( t )] = N 2 ∙ V [ˆ N 1 V [ˆ S 1 ( t )] S 1 ( t ) [ 1 ˆ ] V ∑ ] = L 2 N 2 · V [ˆ S 1 ( t )] L 1 runs ℓ N 1 2 L 1 N 2 ˆ S 2 ( t ) V [ˆ S 2 ( t )] [ 1 V ∑ 1 L 2 runs ℓ N 2 = L 2 N 2 . L 1 N 1 We can replace agents with runs! 20
Through paralellization, we can get a speed-up without losing statistical information Variance of S(t) Variance of I(t) ^(t) (NL) ^(t) (NL) S(t) − % susceptible averaged over # of runs I(t) − % infected averaged over # of runs 6e−04 Simulation 6e−04 Simulation S I 4e−04 4e−04 100 agents, 4 cores 100 agents, 4 cores Variance of ∑ Variance of ∑ l l 2e−04 2e−04 400 agents, 1 core 400 agents, 1 core 0e+00 0e+00 0 25 50 75 100 0 25 50 75 100 Time Time Variance of R(t) ^(t) (NL) R(t) − % recovered averaged over # of runs 6e−04 Simulation R 4e−04 100 agents, 4 cores Variance of ∑ l 2e−04 400 agents, 1 core 0e+00 0 25 50 75 100 Time Simulation 1 (100 agents, 4 cores, 100 times): 3:30 minutes Simulation 2 (400 agents, 1 core, 100 times): 4:05 minutes 21
Future work
There is more work to be done: short-term ∙ Implementation of current methods in FRED ∙ FRED - an open source, supported, flexible AM ∙ Incorporate different levels of homogeneity 1. Independent agents 2. Agents go to one other activity (school, work, neighborhood) 3. Multiple activities ∙ Compare CM and AM parameters empirically ∙ Empirically determine when different regions can be combined 23
Thank you! Questions? 24
Recommend
More recommend