Phylogenies
Phylogenies describe history
Phylogenies describe history Haeckel. 1879.
Phylogenies describe history Pace. 1997. Science.
Phylogenies are the result of branching processes
Timeseries and phylogeny are dual outcomes of an infectious process
Epidemic process Time
Epidemic process Count Time
Epidemic process Count Can ask for the probability of observing this timeseries given epidemiological parameters β and γ . Time
Epidemic process Time
Epidemic process Sample some individuals Time
Epidemic branching process Time
Epidemic branching process Time
Epidemic branching process Can ask for the probability of observing this tree given epidemiological parameters β and γ . Time
The coalescent Assume equilibrium number of infecteds. Call this equilibrium N.
The coalescent Sample some individuals
The coalescent Each generation, there is a small Pr(coal | i = 2) = 1 chance for coalescence for each pair N
The coalescent ◆ 1 Probability of coalescence scales ✓ i N = i ( i − 1) Pr(coal) = 2 2 N quadratically with lineage count
The coalescent
The coalescent
The coalescent
The coalescent ✓ ◆ 2 N T i ∼ Exponential i ( i − 1) T 2 T 3
Demo
Population size affects tree shape The rate of coalescence decreases linearly with the population size N . N = 500 N = 1000 N = 2000 0 5k 10k 0 5k 10k -5k 0 5k 10k N = 5000 N = 10000 N = 20000 -15k -10k -5k 0 5k 10k -5k 0 5k 10k -20k -15k -10k -5k 0 5k 10k
Changing population size Constant size Growing population
Changing population size Constant size Growing population
Given a phylogeny, how can we learn about the evolutionary process that underlies it? Generally, we want to know: p (model | data) Bayes rule: p (model | data) ∝ p (data | model) p (model) Often referred to as: posterior ∝ likelihood × prior
λ – coalescent model – sequence data D µ – mutation model – phylogeny τ In this case, we have: p ( λ | τ ) ∝ p ( τ | λ ) p ( λ ) However, we don’t observe the tree directly: p ( τ , µ | D ) ∝ p ( D | τ , µ ) p ( τ ) p ( µ ) We integrate over uncertainty: Z p ( λ | D ) ∝ p ( D | τ , µ ) p ( τ | λ ) p ( λ ) p ( µ ) d τ dµ
BEAST: Bayesian Evolutionary Analysis by Sampling Trees
Integration through Markov chain Monte Carlo 3 2 1 0 x 2 - 1 - 2 - 3 - 12 - 10 - 8 - 6 - 4 - 2 0 2 x 1
Integration through Markov chain Monte Carlo 3 2 1 0 x 2 - 1 - 2 - 3 - 12 - 10 - 8 - 6 - 4 - 2 0 2 x 1
Metropolis-Hastings algorithm Starting from state θ propose a new state θ *. For the following, this proposal must to symmetric, i.e. Q ( θ ➝ θ *) = Q ( θ * ➝ θ ) If new state is more likely, always accept. If new state is less likely, accept with probability proportional to ratio of new state to old state. ( ) p ( θ *) Acceptance probability: min 1, p ( θ ) Simple example: p ( x ) = 0.2 p ( y ) = 0.8 A ( x ➝ y ) = 0.8/0.2 = 1 A ( y ➝ x ) = 0.2/0.8 = 0.25 Mass moving from x to y: p ( x ) A ( x ➝ y ) = 0.2 ╳ 1 = 0.2 Mass moving from y to x: p ( y ) A ( y ➝ x ) = 0.8 ╳ 0.25 = 0.2
BEAST will produce samples from: λ – coalescent model µ – mutation model – phylogeny τ
Use a ‘skyline’ demographic model N 4 N 3 N 2 N 1
Use a ‘skyline’ demographic model N 4 N 3 N 2 N 1
Practical part 1
Estimating R 0 from timeseries data 1000 100 Individuals 10 1 0.1 0 50 100 150 200 250 300 350 Days r (0) = β − γ r = 0.20 per day for 1918 influenza We know the approximate recovery rate γ ≈ 0 . 25 We can solve for β and hence R 0 β = r + γ ≈ 0 . 45 R 0 = β γ ≈ 0 . 45 0 . 25 ≈ 1 . 8
Growth rate of pandemic H1N1 r = 0.11 per day β = 0.11 + 0.33 = 0.44 per day Ê Ê R 0 = 0.44 / 0.33 = 1.33 Ê 1000 Laboratory confirmed cases Ê 100 Ê Ê Ê Ê 10 Ê Ê Ê 1 Mar Apr May
Generation time τ of infection At the beginning of the epidemic, 1 1 τ = 2 β S (0) = 2 × 0 . 36 = 1 . 39 new infections emerge at rate β . S ( ∞ ) = e − R 0 (1 − S ( ∞ )) Final susceptible fraction: 1 1 At the end of the epidemic: τ = 2 β S ( ∞ ) = 2 × 0 . 36 × 0 . 84 = 1 . 65 1000 0.010 100 Individuals 0.009 10 Τ 0.008 1 0.007 0.1 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 200 250 300 350 Days Time
Effective population sizes of flu vs measles Influenza A (H3N2) Measles 1970 1980 1990 2000 2010 1950 1960 1970 1980 1990 2000 2010 N e � = 7.2 years N e � = 124.6 years N e = 1050 infections (duration of infection of 5 days) N e = 8270 infections (duration of infection of 11 days) N = 70 million infections (prevalence) N = 0.9 million infections (prevalence) Off by a factor of 6,700 Off by a factor of 110
Practical part 2
Continuous time Markov chains (CTMCs) A B A B A A B μ AB = 3 q (A) = 0.25 A μ AB 1.0 μ BA = 1 q (B) = 0.75 B 0.8 μ BA Probability in state X 0.6 0.4 µ BA p t →∞ ( A ) = µ AB + µ BA 0.2 µ AB 0.0 p t →∞ ( B ) = 0.0 0.2 0.4 0.6 0.8 1.0 µ AB + µ BA Time
CTMCs on trees Transition matrix with μ AB = 3 μ BA = 1 t = 0.2 B A B A 0.59 0.41 B 0.14 0.86 A A
A B Integrate over internal states A 0.59 0.41 Transition matrix with μ AB = 3 μ BA = 1 t = 0.2 B 0.14 0.86 B B 0.86 0.41 0.59 0.41 0.14 0.59 0.25 0.25 A A 0.59 0.59 A A B B 0.41 0.86 0.14 0.86 0.14 0.59 0.75 0.75 A A 0.14 0.14 A A
Integrate over internal states Transition matrix with μ AB = 3 μ BA = 1 t = 0.2 B B Pr = 0.0073 Pr = 0.0211 0.86 0.41 0.59 0.41 0.14 0.59 0.25 0.25 A A 0.59 0.59 A A B B 0.41 0.86 Pr = 0.0109 Pr = 0.0036 0.14 0.86 0.14 0.59 0.75 0.75 A A 0.14 0.14 A A
Integrate over internal states p ( D | τ , µ ) = 0.0211 + 0.0073 + 0.0036 + 0.0109 = 0.0429 B B 49% 17% A A A A B B 8% 25% A A A A
Practical part 3
Recommend
More recommend