Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis)

Outline • Intro – What is Bayesian Analysis? – Why be a Bayesian? • What is required to do a Bayesian Analysis? (Priors) • How can the required calculations be done? (MCMC) • Prospects and Warnings

Simple Example: Vesicouretural Reflux (VUR) - valves between the ureters and bladder do not shut fully. • leads to urinary tract infections • if not corrected, can cause serious kidney damage • effective diagnostic tests are available, but they are expensive and invasive

• ≈ 1% of children will have VUR • ≈ 80% of children with VUR will see a doctor about an infection • ≈ 2% of all children will see doctor about an infection Should a child with 1 infection be screened for VUR?

1% of the population has VUR Pr(V) = 0.01 v v v v v v v v v v = 0.1% of the population

80% of kids with VUR get an infection Pr(I|V) = 0.8 Pr(I|V) is a conditional probability

So, 0.8% of the population has VUR and will get an infection Pr(V)Pr(I|V) = 0.01 X 0.8 = 0.008 Pr(I,V) = 0.008 I I I I I I I I v v v v v v v v v v Pr(I,V) is a joint probability

2% of the population gets an infection Pr(I) = 0.02 I I I I I I I I I I ? ? ? ? ? ? ? ? ? ? I I I I I I I I I I ? ? ? ? ? ? ? ? ? ?

We just calculted that 0.8% of kids have VUR and get an infection I I I I I I I I I I v v v v v v v v ? ? I I I I I I I I I I ? ? ? ? ? ? ? ? ? ?

The other 0.12% must not have VUR I I I I I I I I I I v v v v v v v v I I I I I I I I I I So, 40% of kids with infections have VUR Pr(V|I) = 0.4

Pr ( V | I ) = Pr ( V ) Pr ( I | V ) Pr ( I ) 0 . 01 × 0 . 8 Pr ( V | I ) = 0 . 02 = 0 . 40

Pr(I) is higher for females. Pr ( I | ~ ) = 0 . 03 Pr ( I | | ) = 0 . 01 Pr ( V | I, ~ ) = 0 . 01 × 0 . 8 Pr ( V | I, | ) = 0 . 01 × 0 . 8 0 . 03 0 . 01 Pr ( V | I, ~ ) = 0 . 267 Pr ( V | I, | ) = 0 . 8

Bayes’ Rule Pr ( A | B ) = Pr ( A ) Pr ( B | A ) Pr ( B ) Pr (Hypothesis | Data) = Pr (Hypothesis) Pr (Data | Hypothesis) Pr (Data)

Pr (Tree | Data) = Pr (Tree) Pr (Data | Tree) Pr ( Data ) We can ignore Pr ( Data ) (2nd half of this lecure)

Pr (Tree | Data) ∝ Pr ( Tree ) Pr (Data | Tree) Pr (Tree) is the prior probability of the tree.

Pr (Tree | Data) ∝ Pr (Tree) Pr ( Data | Tree ) Pr (Tree) is the prior probability of the tree. Pr ( Data | Tree ) is the likelihood of the tree. Pr (Tree | Data) ∝ Pr (Tree) L ( Tree )

Pr ( Tree | Data ) ∝ Pr (Tree) L (Tree) Pr (Tree) is the prior probability of the tree. L (Tree) is the likelihood of the tree. Pr ( Tree | Data ) is the posterior probability of the tree.

The posterior probability is a great way to evaluate trees: • Ranks trees • Intuitive measure of confidence • Is the ideal “weight” for a tree in secondary analyses • Closely tied to the likelihood

Our models don’t give us L (Tree) They give us things like L (Tree , κ, α, ν 1 , ν 2 , ν 3 , ν 4 , ν 5 ) A B ν 1 ν 2 ν 5 ν 3 ν 4 C D

“Nuisance Parameters” Aspects of the evolutionary model that we don’t care about, but are in the likelihood equation.

Ln Likelihood Profile -2270 -2275 Ln Likelihood -2280 -2285 -2290 4 6 8 10 12 14 κ

Ln Likelihood Profile -2270 max LnL -2275 -2280 -2285 κ MLE of -2290 4 6 8 10 12 14

Marginalizing over (integrating out) nuisance parameters � L (Tree) = L (Tree , κ ) Pr ( κ ) dκ • Removes the nuisance parameter • Takes the entire likelihood function into account

• Avoids estimation errors • Requires a prior for the parameter

When there is substantial uncertainty in a parameter’s value, marginalizing can give qualitatively different answers than using the MLE. Likelihood Nuisance Parameter

Trees ω Joint posterior probability density for trees and ω

1 2 3 4 5 6 7 ω 8 Trees 9 Marginalize over ω by 1 0 1 1 1 2 1 3 1 summing probability 4 1 5 in this direction Posterior Probability 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Trees

0.0 1.0 2.0 Trees ω M a r g i n a l i z e o v e r t r e e s b y Posterior Prob. Density s u m m i n g p r o b a b i l i t y i n t h i s d i r e c t i o n 0 1 2 ω

The Bayesian Perspective Pros Cons Posterior probability Is it robust? is the ideal measure of support Focus of inference is flexible Marginalizes over Requires a prior nuisance parameters

Priors • Probability distributions • Specified before analyzing the data • Needed for – Hypotheses (trees) – Parameters

Probability Distributions Reflect the action of random forces

Probability Distributions Reflect the action of random forces OR (if you’re a Bayesian) Reflect your uncertainty

∝ X slide courtesy of Derrick Zwickl

∝ X X ∝ slide courtesy of Derrick Zwickl

Considerations when choosing a prior for a parameter • What values are most likely?

Subjective Prior on Pr(Heads) 0 0.2 0.4 0.6 0.8 1 p = Pr(Heads)

Considerations when choosing a prior for a parameter • What values are most likely? • How do you express ignorance? – vague distributions

Flat Prior on p 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 p = P(Heads)

“Non-informative” priors • Misleading term • Used by many Bayesians to mean “prior that is expected to have the smallest effect on the posterior” • Not always a uniform prior

Considerations when choosing a prior for a parameter • What values are most likely? • How do you express ignorance? – vague distributions – How easily can the likelihood discriminate between parameter values?

Jeffrey's (Default) Prior 6 5 4 3 2 1 0 0 0.2 0.4 0.6 0.8 1 p = P(Heads)

Example: The Kimura model Ratio of rates ( , 0 ∞ ) κ = r A C ti r tv Proportion transitions ( , ) 0 1 G T r ti φ = r 2 r + ti tv Slide by Zwickl

n κ and φ map onto the predictions of K80 very differently Ratio of rates Proportion transitions Slide by Zwickl

K80 : κ and φ n The likelihood surface is tied to the model predictions n The ML estimates are equivalent n The curve shapes (and integrals) are quite different Slide by Zwickl

Effects of the Prior in the GTR model MLE = 45.2 0.14 Posterior Density 0.12 0.1 0.08 0.06 0.04 Using Dirichlet Prior 0.02 Using U(0,200)Prior 0 0 25 50 75 100 125 150 175 200 C<->T rate G<->T rate

Minimizing the effect of priors • Flat � = non-informative • Familiar model parameterizations may perform poorly in a Bayesian analysis with flat priors.

Considerations when choosing a prior for a parameter • What values are most likely? • How do you express ignorance? (minimally informative priors) • Are some errors better than others?

Log-Likelihood for 3 trees -3870 -3875 Ln(Likelihood) -3880 -3885 -3890 -3895 -3900 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Internal Branch Length

The Tree's Posterior vs the Branch Length's Prior Mean 1 0.9 0.8 Posterior 0.7 0.6 0.5 0.4 0.01 0.1 1 10 100 Internal Branch Prior Mean

We might make analyses more conservative by • Favoring short internal branch lengths • Placing some prior probability on “star” trees (Lewis et al. )

We need to worry about sensitivity of our conclusions to all “inputs” • Data • Model • Priors Often priors will be the least of our concerns

∝ X X ∝ slide courtesy of Derrick Zwickl

The prior can be a benefit (not just a necessity) of Bayesian analysis • Incorporate previous information • Make the analysis more conservative But...

It can be hard to say “I don’t know” Priors can strongly affect the analysis if ... • The prior strongly favors some parameter values, OR • The data (via the likelihood) are not very informative (little data or complex model) Because Bayesian inference relies on marginalization, the priors for all parameters can affect the posterior probabilities of the hypotheses of interest.

How do we calculate a posterior probability? Pr (Tree | Data) = Pr (Tree) L (Tree) Pr ( Data ) In particular, how do we calculate Pr ( Data )?

Pr (Data) is the marginal probability of the data, so � Pr (Data) = Pr (Tree i ) L (Tree i ) i But this is a sum over all trees (there are lots of trees). Recall that even L (Tree i ) involves multiple integrals.

� � � � � � Pr (D) = Posterior Probability Density L (Tree i , κ, α, ν 1 , ν 2 , ν 3 , ν 4 , ν 5 ) Pr (Tree i ) Pr ( κ ) Pr ( α ) Pr ( ν 1 ) Pr ( ν 2 ) · ·

Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) - PowerPoint PPT Presentation

Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) Outline Intro What is Bayesian Analysis? Why be a Bayesian? What is required to do a Bayesian Analysis? (Priors) How can the required calculations be done?

Phylogenetics COS551, Fall 2003 Mona Singh Phylogenetics Phylogenetic trees illustrate the

12-11-06 Phylogenetics 1: An overview Phylogenetics 1: An overview Phylogenetic tree used in The

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Phylogenetics WHO-TDR Bioinformatics Workshop Jessica Kissinger New Delhi, India October, 2005

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets

The phylogenetics of basic word order Gerhard Jger Tbingen University University of

Combinatorics of spaces of trees: an application of topology to phylogenetics Curran N. McConnell

1 Phylogenetics: The biological discipline devoted to reconstructing, gene or genome phylogenies

Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice

Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 Luay Nakhleh, Rice University Bayes Rule

Fall 2020 Infosession Tuesday, September 8 O U R S T O R Y California Physicians Alliance ( CaPA

Loom Weaving Instrumentation for Program Analysis Brian Kidney (Presenter) Jonathan Anderson

Equilibrium Behavior in Competing Dynamic Matching Markets Zhuoshu Li , Neal Gupta, Sanmay Das,

Unpaired Kidney Exchange: Overcoming the double coincidence of wants without a medium of exchange

brms: Bayesian Multilevel Models using Stan Paul Brkner 2018-04-09 1 Why using Multilevel

THE IBEAT STUDY SETUP A MODEL FOR PARENCHIMA? Scaling up research Consortium-type thinking

Purely Functional Data Structures and Monoids Donnacha Ois n Kidney May 9, 2020 1 Purely

To help re-establish context for anyone involved subsequent