simulation based testing in an approximate bayesian
play

Simulation-based testing in an approximate Bayesian framework - PowerPoint PPT Presentation

Simulation-based testing in an approximate Bayesian framework Jessica W. Leigh and David Bryant 5 November 2010 Simulation-Based Test Methodology Does my method work well? Figure out all/most Choose a few parameters to test reasonable


  1. Simulation-based testing in an approximate Bayesian framework Jessica W. Leigh and David Bryant 5 November 2010

  2. Simulation-Based Test Methodology Does my method work well? Figure out all/most Choose a few parameters to test reasonable parameters Run many, many simula�ons (discard undesirable simula�ons)

  3. Simulation-Based Test Methodology Does my method work well? Figure out all/most Choose a few parameters to test reasonable parameters Run many, many simula�ons (discard undesirable simula�ons)

  4. Example: Success of Phylogenetic Methods Huelsenbeck and Hillis, Syst Biol 1993

  5. Simulation-Based Test Methodology Does my method work well? Figure out all/most Choose a few parameters to test reasonable parameters Run many, many simula�ons (discard undesirable simula�ons)

  6. Example: Heterotachy A B C D E F

  7. Example: Heterotachy Kolaczkowski and Thornton, Nature 2004

  8. Ideology Wars +

  9. Ideology Wars (15 combina�ons)

  10. Example: Heterotachy (revisited) 1 0.5 f 0 0 0.2 0.4 r Spencer, Susko, Roger, Mol Biol Evol 2004

  11. { { Is There a Better Way? { Simulation-based method assessment is inefficient: grid search { Not very rigorous if only a few select parameter values are tested requires too many different combinations of values for relevant { Potentially dishonest parameters { We can do better!

  12. Is There a Better Way? { Simulation-based method assessment is inefficient: grid search { Not very rigorous if only a few select parameter values are tested requires too many different combinations of values for relevant { Potentially dishonest parameters { We can do better! { Needed: a method to explore parameters where the test does well { MCMC can do this and where it does poorly

  13. Recipe: MCMC-Based Simulation Test { Let φ ( X ) denote a specific question addressing the performance of R Does one method outperform another? R Does a method produce a false positive? the method using simulated data X { Sample from the probability distribution of parameter θ given a “true” answer to the question we asked ( P ( θ | φ ( X ) = 1) )

  14. Markov chain Monte Carlo { Suppose we wish to sample from some distribution π ( X ) { Generate a Markov chain X 1 , X 2 , . . . , X k , . . . by repeatedly { The chain is set up such that its stationary distribution is the accepting or rejecting states drawn from a proposal distribution { Moves satisfy the detailed balance condition: distribution of interest f ( X i , X i +1 ) π ( X i ) = f ( X i +1 , X i ) π ( X i +1 ) , where f ( X i , X i +1 ) is the probability of moving from state X i to X i +1

  15. { Uses likelihoods to accept or reject moves, samples from the MCMC: Metropolis-Hastings Algorithm { Let q ( θ i +1 | θ i ) be the probability of proposing state θ i +1 given the distribution P ( θ |D ) { Consider the k th iteration of the chain: current state θ i and let π ( θ i ) = P ( D| θ i ) θ k +1 ∼ q ( ·| θ k ) � � 1 , π ( θ k +1 ) q ( θ k | θ k +1 ) α ← min π ( θ k ) q ( θ k +1 | θ k ) u ∼ U (0 , 1) if u > a then θ k +1 ← θ k end if

  16. { Recall: φ ( X ) is a question that can be asked using data X and MCMC: Exact Approximate Bayesian Simulation Framework P ( θ | φ ( X ) = 1) is the distribution of interest Our algorithm θ k +1 ∼ q ( ·| θ k ) X ← simulate using θ k +1 if φ ( X ) = 0 then θ k +1 ← θ k end if

  17. { Recall: φ ( X ) is a question that can be asked using data X and MCMC: Exact Approximate Bayesian Simulation Framework { . . . and it satisfies the detailed balance condition P ( θ | φ ( X ) = 1) is the distribution of interest Our algorithm θ k +1 ∼ q ( ·| θ k ) X ← simulate using θ k +1 if φ ( X ) = 0 then θ k +1 ← θ k end if

  18. { Recall: φ ( X ) is a question that can be asked using data X and MCMC: Exact Approximate Bayesian Simulation Framework { . . . and it satisfies the detailed balance condition { An application of Approximate Bayesian Computation (Marjoram et P ( θ | φ ( X ) = 1) is the distribution of interest al, PNAS 2003) that samples exactly from the distribution of interest Our algorithm ABC θ k +1 ∼ q ( ·| θ k ) θ k +1 ∼ q ( ·| θ k ) X ∗ ← simulate using θ k +1 X ← simulate using θ k +1 if ρ ( X, X ∗ ) > ε then if φ ( X ) = 0 then θ k +1 ← θ k θ k +1 ← θ k end if end if

  19. Example: UPGMA vs. NJ { UPGMA and Neighbour-Joining are methods that produce { Neighbour-Joining (Saitou and Nei, MBE 1987) remains a popular phylogenetic trees given a matrix of pairwise distances between biological sequences representing the tips of a true tree { Earned Masatoshi Nei an award presented by Emperor Akihito who phylogenetic inference method and has been cited over 22,000 times (according to Google Scholar) { UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is stated that he himself had used NJ! average linkage hierarchical clustering applied to phylogenetic data; it is generally no longer used for phylogenetic analysis because it is very sensitive to variation in evolutionary rate across lineages

  20. Example: UPGMA vs. NJ (cont’d) { Let T be a true phylogenetic tree, and ˆ { Let θ = ( s, γ ) be a pair of parameters describing edge length scale UP GMA and ˆ T X T X NJ be trees inferred from dataset X by UPGMA and NJ, respectively { At each iteration, a new value for either s or γ is proposed, and a (tree height) and skewness (non-clocklikeness) { If ˆ sequence alignment X is simulated from T with edge lengths described by θ UP GMA is at least as close to T as ˆ T X T X NJ the new value is accepted

  21. UPGMA vs. NJ: Skew and Scale Explained A B C D E F G H G E F A H Increasing scale B D C A B C D E F G H G E F A H B D C Increasing skew

  22. Example: UPGMA vs. NJ (cont’d) { Let T be a true phylogenetic tree, and ˆ { Let θ = ( s, γ ) be a pair of parameters describing edge length scale UP GMA and ˆ T X T X NJ be trees inferred from dataset X by UPGMA and NJ, respectively { At each iteration, a new value for either s or γ is proposed, and a (tree height) and skewness (non-clocklikeness) { If ˆ sequence alignment X is simulated from T with edge lengths described by θ UP GMA is at least as close to T as ˆ T X T X NJ the new value is accepted

  23. UPGMA vs. NJ Grid Search MCMC 1 5 0.005 4 4 0.75 0.004 3 3 0.003 Scale Scale 0.5 0.002 2 2 0.25 0.001 1 1 0.000 0 0.0 0.5 1.0 1.5 2.0 0.5 1.0 1.5 2.0 Skew Skew

  24. FluTE Simulator { An influenza outbreak simulator run in half-day intervals { Uses census data to simulate individuals, with contact probabilities based on on age and type of relationship (tuned to produce results similar to historical epidemics) preschool child child young adult adult older adult Family, infectious is child 0.8 0.8 0.35 0.35 0.35 Family, infectious is adult 0.25 0.25 0.4 0.4 0.4 Household cluster, infectious is child 0.08 0.08 0.035 0.035 0.035 Household cluster, infectious is adult 0.025 0.025 0.04 0.04 0.04 Neighborhood 0.0000435 0.0001305 0.000348 0.000348 0.000696 Chao, Community 0.0000109 0.0000326 0.000087 0.000087 0.000174 Workplace 0.05 0.05 Playgroup 0.28 Daycare 0.12 Elementary school 0.0348 Middle school 0.03 High school 0.0252 Halloran et al, PLoS Comp Biol 2010

  25. The FluTE Influenza Simulator { Various parameters, including basic reproductive number ( R 0 ) , Chao, Halloran et al, PLoS Comp Biol 2010 and prevaccinated fraction of the population

  26. The FluTE Influenza Simulator { Various parameters, including basic reproductive number ( R 0 ) , Chao, Halloran et al, PLoS Comp Biol 2010 and prevaccinated fraction of the population

  27. { School closure might help prevent epidemics because children have School Closure and Influenza { In reality, if communities tend to organise social groups of children very high contact probability within a school { School closure can be expensive in terms of parental absence from that mimic schools, { Published simulation studies suggest that school closure might work R Delay could be useful: often matched vaccines are unavailable reduce the peak number of infected individuals and delay epidemics { A different question: given that school closure is effective, what is the at the onset of a pandemic distribution of R 0 and prevaccinated fraction?

  28. FluTE MCMC Results School Closure Reduces Peak Infec�on An�virals Reduce Peak Infec�on 0.012 0.010 0.6 0.6 0.010 0.5 0.5 0.008 vaccinationfraction vaccinationfraction 0.008 0.4 0.4 0.006 0.006 0.3 0.3 0.004 0.004 0.2 0.2 0.002 0.002 0.1 0.1 0.000 0.000 1.5 2.0 2.5 1.4 1.6 1.8 2.0 2.2 2.4 R0 R0

  29. FluTE MCMC Results (Part 2) School Closure Reduces Cumula�ve Infec�on An�virals Reduce Cumula�ve Infec�on 0.014 0.012 0.6 0.6 0.012 0.010 0.5 0.5 0.010 vaccinationfraction vaccinationfraction 0.008 0.4 0.4 0.008 0.006 0.006 0.3 0.3 0.004 0.004 0.2 0.2 0.002 0.002 0.1 0.1 0.000 0.000 1.4 1.6 1.8 2.0 2.2 2.4 1.5 2.0 2.5 R0 R0

  30. FluTE MCMC Discussion { For combinations of high R 0 and low vaccination, school closure { School closure reduced the cumulative infection level only for reduced the peak but not the cumulative infection level combinations of low R 0 and high vaccination

  31. FluTE MCMC Discussion

Recommend


More recommend