bayesian phylogenetics
play

Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) - PowerPoint PPT Presentation

Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) Outline Intro What is Bayesian Analysis? Why be a Bayesian? What is required to do a Bayesian Analysis? (Priors) How can the required calculations be done?


  1. Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis)

  2. Outline • Intro – What is Bayesian Analysis? – Why be a Bayesian? • What is required to do a Bayesian Analysis? (Priors) • How can the required calculations be done? (MCMC) • Prospects and Warnings

  3. Simple Example: Vesicouretural Reflux (VUR) - valves between the ureters and bladder do not shut fully. • leads to urinary tract infections • if not corrected, can cause serious kidney damage • effective diagnostic tests are available, but they are expensive and invasive

  4. • ≈ 1% of children will have VUR • ≈ 80% of children with VUR will see a doctor about an infection • ≈ 2% of all children will see doctor about an infection Should a child with 1 infection be screened for VUR?

  5. 1% of the population has VUR Pr(V) = 0.01 v v v v v v v v v v = 0.1% of the population

  6. 80% of kids with VUR get an infection Pr(I|V) = 0.8 Pr(I|V) is a conditional probability

  7. So, 0.8% of the population has VUR and will get an infection Pr(V)Pr(I|V) = 0.01 X 0.8 = 0.008 Pr(I,V) = 0.008 I I I I I I I I v v v v v v v v v v Pr(I,V) is a joint probability

  8. 2% of the population gets an infection Pr(I) = 0.02 I I I I I I I I I I ? ? ? ? ? ? ? ? ? ? I I I I I I I I I I ? ? ? ? ? ? ? ? ? ?

  9. We just calculted that 0.8% of kids have VUR and get an infection I I I I I I I I I I v v v v v v v v ? ? I I I I I I I I I I ? ? ? ? ? ? ? ? ? ?

  10. The other 0.12% must not have VUR I I I I I I I I I I v v v v v v v v I I I I I I I I I I So, 40% of kids with infections have VUR Pr(V|I) = 0.4

  11. Pr ( V | I ) = Pr ( V ) Pr ( I | V ) Pr ( I ) 0 . 01 × 0 . 8 Pr ( V | I ) = 0 . 02 = 0 . 40

  12. Pr(I) is higher for females. Pr ( I | ~ ) = 0 . 03 Pr ( I | | ) = 0 . 01 Pr ( V | I, ~ ) = 0 . 01 × 0 . 8 Pr ( V | I, | ) = 0 . 01 × 0 . 8 0 . 03 0 . 01 Pr ( V | I, ~ ) = 0 . 267 Pr ( V | I, | ) = 0 . 8

  13. Bayes’ Rule Pr ( A | B ) = Pr ( A ) Pr ( B | A ) Pr ( B ) Pr (Hypothesis | Data) = Pr (Hypothesis) Pr (Data | Hypothesis) Pr (Data)

  14. Pr (Tree | Data) = Pr (Tree) Pr (Data | Tree) Pr ( Data ) We can ignore Pr ( Data ) (2nd half of this lecure)

  15. Pr (Tree | Data) ∝ Pr ( Tree ) Pr (Data | Tree) Pr (Tree) is the prior probability of the tree.

  16. Pr (Tree | Data) ∝ Pr (Tree) Pr ( Data | Tree ) Pr (Tree) is the prior probability of the tree. Pr ( Data | Tree ) is the likelihood of the tree. Pr (Tree | Data) ∝ Pr (Tree) L ( Tree )

  17. Pr ( Tree | Data ) ∝ Pr (Tree) L (Tree) Pr (Tree) is the prior probability of the tree. L (Tree) is the likelihood of the tree. Pr ( Tree | Data ) is the posterior probability of the tree.

  18. The posterior probability is a great way to evaluate trees: • Ranks trees • Intuitive measure of confidence • Is the ideal “weight” for a tree in secondary analyses • Closely tied to the likelihood

  19. Our models don’t give us L (Tree) They give us things like L (Tree , κ, α, ν 1 , ν 2 , ν 3 , ν 4 , ν 5 ) A B ν 1 ν 2 ν 5 ν 3 ν 4 C D

  20. “Nuisance Parameters” Aspects of the evolutionary model that we don’t care about, but are in the likelihood equation.

  21. Ln Likelihood Profile -2270 -2275 Ln Likelihood -2280 -2285 -2290 4 6 8 10 12 14 κ

  22. Ln Likelihood Profile -2270 max LnL -2275 -2280 -2285 κ MLE of -2290 4 6 8 10 12 14

  23. Marginalizing over (integrating out) nuisance parameters � L (Tree) = L (Tree , κ ) Pr ( κ ) dκ • Removes the nuisance parameter • Takes the entire likelihood function into account

  24. • Avoids estimation errors • Requires a prior for the parameter

  25. When there is substantial uncertainty in a parameter’s value, marginalizing can give qualitatively different answers than using the MLE. Likelihood Nuisance Parameter

  26. Trees ω Joint posterior probability density for trees and ω

  27. 1 2 3 4 5 6 7 ω 8 Trees 9 Marginalize over ω by 1 0 1 1 1 2 1 3 1 summing probability 4 1 5 in this direction Posterior Probability 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Trees

  28. 0.0 1.0 2.0 Trees ω M a r g i n a l i z e o v e r t r e e s b y Posterior Prob. Density s u m m i n g p r o b a b i l i t y i n t h i s d i r e c t i o n 0 1 2 ω

  29. The Bayesian Perspective Pros Cons Posterior probability Is it robust? is the ideal measure of support Focus of inference is flexible Marginalizes over Requires a prior nuisance parameters

  30. Priors • Probability distributions • Specified before analyzing the data • Needed for – Hypotheses (trees) – Parameters

  31. Probability Distributions Reflect the action of random forces

  32. Probability Distributions Reflect the action of random forces OR (if you’re a Bayesian) Reflect your uncertainty

  33. ∝ X slide courtesy of Derrick Zwickl

  34. ∝ X slide courtesy of Derrick Zwickl

  35. ∝ X X ∝ slide courtesy of Derrick Zwickl

  36. Considerations when choosing a prior for a parameter • What values are most likely?

  37. Subjective Prior on Pr(Heads) 0 0.2 0.4 0.6 0.8 1 p = Pr(Heads)

  38. Considerations when choosing a prior for a parameter • What values are most likely? • How do you express ignorance? – vague distributions

  39. Flat Prior on p 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 p = P(Heads)

  40. “Non-informative” priors • Misleading term • Used by many Bayesians to mean “prior that is expected to have the smallest effect on the posterior” • Not always a uniform prior

  41. Considerations when choosing a prior for a parameter • What values are most likely? • How do you express ignorance? – vague distributions – How easily can the likelihood discriminate between parameter values?

  42. Jeffrey's (Default) Prior 6 5 4 3 2 1 0 0 0.2 0.4 0.6 0.8 1 p = P(Heads)

  43. Example: The Kimura model Ratio of rates ( , 0 ∞ ) κ = r A C ti r tv Proportion transitions ( , ) 0 1 G T r ti φ = r 2 r + ti tv Slide by Zwickl

  44. n κ and φ map onto the predictions of K80 very differently Ratio of rates Proportion transitions Slide by Zwickl

  45. K80 : κ and φ n The likelihood surface is tied to the model predictions n The ML estimates are equivalent n The curve shapes (and integrals) are quite different Slide by Zwickl

  46. Effects of the Prior in the GTR model MLE = 45.2 0.14 Posterior Density 0.12 0.1 0.08 0.06 0.04 Using Dirichlet Prior 0.02 Using U(0,200)Prior 0 0 25 50 75 100 125 150 175 200 C<->T rate G<->T rate

  47. Minimizing the effect of priors • Flat � = non-informative • Familiar model parameterizations may perform poorly in a Bayesian analysis with flat priors.

  48. Considerations when choosing a prior for a parameter • What values are most likely? • How do you express ignorance? (minimally informative priors) • Are some errors better than others?

  49. Log-Likelihood for 3 trees -3870 -3875 Ln(Likelihood) -3880 -3885 -3890 -3895 -3900 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Internal Branch Length

  50. The Tree's Posterior vs the Branch Length's Prior Mean 1 0.9 0.8 Posterior 0.7 0.6 0.5 0.4 0.01 0.1 1 10 100 Internal Branch Prior Mean

  51. We might make analyses more conservative by • Favoring short internal branch lengths • Placing some prior probability on “star” trees (Lewis et al. )

  52. We need to worry about sensitivity of our conclusions to all “inputs” • Data • Model • Priors Often priors will be the least of our concerns

  53. ∝ X X ∝ slide courtesy of Derrick Zwickl

  54. The prior can be a benefit (not just a necessity) of Bayesian analysis • Incorporate previous information • Make the analysis more conservative But...

  55. It can be hard to say “I don’t know” Priors can strongly affect the analysis if ... • The prior strongly favors some parameter values, OR • The data (via the likelihood) are not very informative (little data or complex model) Because Bayesian inference relies on marginalization, the priors for all parameters can affect the posterior probabilities of the hypotheses of interest.

  56. How do we calculate a posterior probability? Pr (Tree | Data) = Pr (Tree) L (Tree) Pr ( Data ) In particular, how do we calculate Pr ( Data )?

  57. Pr (Data) is the marginal probability of the data, so � Pr (Data) = Pr (Tree i ) L (Tree i ) i But this is a sum over all trees (there are lots of trees). Recall that even L (Tree i ) involves multiple integrals.

  58. � � � � � � Pr (D) = Posterior Probability Density L (Tree i , κ, α, ν 1 , ν 2 , ν 3 , ν 4 , ν 5 ) Pr (Tree i ) Pr ( κ ) Pr ( α ) Pr ( ν 1 ) Pr ( ν 2 ) · ·

Recommend


More recommend