Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 Luay - PowerPoint PPT Presentation

1 Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 Luay Nakhleh, Rice University 2 Bayes Rule P ( X = x | Y = y ) = P ( X = x, Y = y ) P ( X = x ) P ( Y = y | X = x ) = P ( Y = y ) P x 0 P ( X = x 0 ) P ( Y = y | X = x 0 ) 3 Bayes Rule Example (from “Machine Learning: A Probabilistic Perspective”) Consider a woman in her 40s who decides to have a mammogram. Question: If the test is positive, what is the probability that she has cancer? The answer depends on how reliable the test is! Phylogenetics-Bayesian - March 30, 2017

4 Bayes Rule Suppose the test has a sensitivity of 80%; that is, if a person has cancer, the test will be positive with probability 0.8. If we denote by x=1 the event that the mammogram is positive, and by y=1 the event that the person has breast cancer, then P(x=1|y=1)=0.8. 5 Bayes Rule Does the probability that the woman in our example (who tested positive) has cancer equal 0.8? 6 Bayes Rule No! That ignores the prior probability of having breast cancer, which, fortunately, is quite low: p(y=1)=0.004 Phylogenetics-Bayesian - March 30, 2017

7 Bayes Rule Further, we need to take into account the fact that the test may be a false positive. Mammograms have a false positive probability of p(x=1|y=0)=0.1. 8 Bayes Rule Combining all these facts using Bayes rule, we get (using p(y=0)=1 -p(y=1)): p ( x =1 | y =1) p ( y =1) p ( y = 1 | x = 1) = p ( x =1 | y =1) p ( y =1)+ p ( x =1 | y =0) p ( y =0) 0 . 8 × 0 . 004 = 0 . 8 × 0 . 004+0 . 1 × 0 . 996 = 0 . 031 9 How does Bayesian reasoning apply to phylogenetic inference? Phylogenetics-Bayesian - March 30, 2017

10 Assume we are interested in the relationships between human, gorilla, and chimpanzee (with orangutan as an outgroup). There are clearly three possible relationships. 11 A B C 12 Before the analysis, we need to specify our prior beliefs about the relationships. For example, in the absence of background data, a simple solution would be to assign equal probability to the possible trees. Phylogenetics-Bayesian - March 30, 2017

13 A B C 1.0 Probability Prior distribution 0.5 0.0 [This is an uninformative prior] 14 To update the prior, we need some data, typically in the form of a molecular sequence alignment, and a stochastic model of the process generating the data on the tree. 15 In principle, Bayes rule is then used to obtain the posterior probability distribution, which is the result of the analysis. The posterior specifies the probability of each tree given the model, the prior, and the data. Phylogenetics-Bayesian - March 30, 2017

16 When the data are informative, most of the posterior probability is typically concentrated on one tree (or, a small subset of trees in a large tree space). 17 A B C 1.0 Probability Prior distribution 0.5 0.0 Data (observations) 1.0 Probability Posterior distribution 0.5 0.0 18 To describe the analysis mathematically, consider: the matrix of aligned sequences X the tree topology parameter τ the branch lengths of the tree ν (typically, substitution model parameters are also included) Let θ =( τ , ν ) Phylogenetics-Bayesian - March 30, 2017

19 Bayes theorem allows us to derive the posterior distribution as f ( θ | X ) = f ( θ ) f ( X | θ ) f ( X ) where � f ( X ) = f ( θ ) f ( X | θ ) d θ � � = f ( v ) f ( X | τ , v ) d v v τ 20 Posterior Probability 48% 32% 20% topology A topology B topology C The marginal probability distribution on topologies 21 Why are they called marginal probabilities? Topologies Joint probabilities τ τ τ A B C Branch length vectors A ν 0.10 0.07 0.12 0.29 B ν 0.05 0.22 0.06 0.33 C ν 0.05 0.19 0.14 0.38 0.20 0.48 0.32 Marginal probabilities Phylogenetics-Bayesian - March 30, 2017

22 Markov chain Monte Carlo Sampling 23 In most cases, it is impossible to derive the posterior probability distribution analytically. Even worse, we can’t even estimate it by drawing random samples from it. The reason is that most of the posterior probability is likely to be concentrated in a small part of a vast parameter space. 24 The solution is to estimate the posterior probability distribution using Markov chain Monte Carlo sampling, or MCMC for short. Monte Carlo = random simulation Markov chain = the state of the simulator depends only on the current state Phylogenetics-Bayesian - March 30, 2017

25 Irreducible Markov chains (their topology is strongly connected) have the property that they converge towards an equilibrium state (stationary distribution) regardless of starting point. We just need to set up a Markov chain that converges onto our posterior probability distribution! 26-1 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 P ( x i +1 = 0 | x i = 0) = 0 . 4 P ( x i +1 = 1 | x i = 0) = 0 . 6 P ( x i +1 = 0 | x i = 1) = 0 . 9 P ( x i +1 = 1 | x i = 1) = 0 . 1 26-2 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 P ( x i +1 = 0 | x i = 0) = 0 . 4 P ( x i +1 = 1 | x i = 0) = 0 . 6 P ( x i +1 = 0 | x i = 1) = 0 . 9 P ( x i +1 = 1 | x i = 1) = 0 . 1 What are ? P ( x i = 0 | x 0 = 0) P ( x i = 1 | x 0 = 0) P ( x i = 0 | x 0 = 1) P ( x i = 1 | x 0 = 1) Phylogenetics-Bayesian - March 30, 2017

27-1 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 P ( x i = k | x 0 = ` ) = P ( x i = k | x i − 1 = 0) P ( x i − 1 = 0 | x 0 = ` ) + P ( x i = k | x i − 1 = 1) P ( x i − 1 = 1 | x 0 = ` ) 27-2 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 P ( x i = k | x 0 = ` ) = P ( x i = k | x i − 1 = 0) P ( x i − 1 = 0 | x 0 = ` ) + P ( x i = k | x i − 1 = 1) P ( x i − 1 = 1 | x 0 = ` ) transition probabilities 28-1 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 1.0 1.0 0.8 0.8 0.6 0.6 Pr(x_i=0) Pr(x_i=0) 0.4 0.4 0.2 0.2 0.0 0.0 0 5 10 15 0 5 10 15 i i P ( x i = 0 | x 0 = 0) P ( x i = 0 | x 0 = 1) Phylogenetics-Bayesian - March 30, 2017

28-2 Stationary Distribution of a Markov Chain 0.6 same probability regardless of 0 . 4 0 1 0 . 1 starting state! 0 . 9 1.0 1.0 0.8 0.8 0.6 0.6 Pr(x_i=0) Pr(x_i=0) 0.4 0.4 0.2 0.2 0.0 0.0 0 5 10 15 0 5 10 15 i i P ( x i = 0 | x 0 = 0) P ( x i = 0 | x 0 = 1) 29-1 Stationary Distribution of a Markov Chain 1.0 1.0 0.6 0.8 0.8 0.6 0.6 Pr(x_i=0) Pr(x_i=0) 0 . 4 0 1 0 . 1 0.4 0.4 0.2 0.2 0 . 9 0.0 0.0 0 5 10 15 0 5 10 15 i i P ( x i = 0 | x 0 = 0) P ( x i = 0 | x 0 = 1) 29-2 Stationary Distribution of a Markov Chain 1.0 1.0 0.6 0.8 0.8 0.6 0.6 Pr(x_i=0) Pr(x_i=0) 0 . 4 0 1 0 . 1 0.4 0.4 0.2 0.2 0 . 9 0.0 0.0 0 5 10 15 0 5 10 15 i i P ( x i = 0 | x 0 = 0) P ( x i = 0 | x 0 = 1) where does the 0.6 come from? Phylogenetics-Bayesian - March 30, 2017

29-3 Stationary Distribution of a Markov Chain 1.0 1.0 0.6 0.8 0.8 0.6 0.6 Pr(x_i=0) Pr(x_i=0) 0 . 4 0 1 0 . 1 0.4 0.4 0.2 0.2 0 . 9 0.0 0.0 0 5 10 15 0 5 10 15 i i P ( x i = 0 | x 0 = 0) P ( x i = 0 | x 0 = 1) where does the 0.6 come from? stationary distribution: π 0 =0.6 π 1 =0.4 30-1 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 30-2 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 Imagine infinitely many chains. At equilibrium (steady-state), the “flux out” of each state must be equal to the “flux into” that state. Phylogenetics-Bayesian - March 30, 2017

30-3 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 Imagine infinitely many chains. At equilibrium (steady-state), the “flux out” of each state must be equal to the “flux into” that state. π 0 P ( x i +1 = 1 | x i = 0) = π 1 P ( x i +1 = 0 | x i = 1) P ( x i +1 = 0 | x i = 1) π 0 = P ( x i +1 = 1 | x i = 0) π 1 30-4 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 Imagine infinitely many chains. At equilibrium (steady-state), the “flux out” of each state must be equal to the “flux into” that state. π 0 P ( x i +1 = 1 | x i = 0) = π 1 P ( x i +1 = 0 | x i = 1) P ( x i +1 = 0 | x i = 1) π 0 = P ( x i +1 = 1 | x i = 0) π 1 π 0 = P ( x i = 0) π 1 = P ( x i = 1) 31-1 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 Phylogenetics-Bayesian - March 30, 2017

31-2 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 | P ( x i +1 = 0 | x i = 1) π 0 = P ( x i +1 = 1 | x i = 0) π 1 31-3 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 | P ( x i +1 = 0 | x i = 1) π 0 = P ( x i +1 = 1 | x i = 0) π 1 π 0 + π 1 = 1 31-4 Stationary Distribution of a Markov Chain 0.6 0 . 4 0 1 0 . 1 0 . 9 | P ( x i +1 = 0 | x i = 1) π 0 = P ( x i +1 = 1 | x i = 0) π 1 π 0 + π 1 = 1 0 . 9 π 0 = 0 . 6 = 1 . 5 π 1 = 1 . 5 π 1 π 0 1 . 5 π 1 + π 1 = 1 . 0 = 0 . 4 π 1 = 0 . 6 π 0 Phylogenetics-Bayesian - March 30, 2017

Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 Luay - PowerPoint PPT Presentation

1 Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 Luay Nakhleh, Rice University 2 Bayes Rule P ( X = x | Y = y ) = P ( X = x, Y = y ) P ( X = x ) P ( Y = y | X = x ) = P ( Y = y ) P x 0 P ( X = x 0 ) P ( Y = y | X = x 0 ) 3 Bayes

Phylogenetics COS551, Fall 2003 Mona Singh Phylogenetics Phylogenetic trees illustrate the

12-11-06 Phylogenetics 1: An overview Phylogenetics 1: An overview Phylogenetic tree used in The

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann

Phylogenetics WHO-TDR Bioinformatics Workshop Jessica Kissinger New Delhi, India October, 2005

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of

The phylogenetics of basic word order Gerhard Jger Tbingen University University of

Combinatorics of spaces of trees: an application of topology to phylogenetics Curran N. McConnell

1 Phylogenetics: The biological discipline devoted to reconstructing, gene or genome phylogenies

Principles of Phylogenetics Reading and Inferring Trees Finlay Maguire April 1, 2020 FCS,

Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt

Analysis of gene copy number changes in tumor phylogenetics Jun Zhou, Yu Lin, Vaibhav Rajan,

Analysis of gene copy number changes in tumor phylogenetics Jijun Tang jtang@cse.sc.edu Tuesday

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation

EISI Plant-Pollinator Networks 2017 1. Jane S. Huestis Phylogenetics of plant-pollinator

Knowledge Acquisition COMP62342 Sean Bechhofer University of Manchester

Module 9 Media Communications Module Nine: Media Communications 1 Objectives Understand

8. Other Deep Architectures CS 519 Deep Learning, Winter 2018 Fuxin Li With materials from Zsolt

Share Everything Play Fair Dont Hit People Put Things Back Where You Found Them

B i n a r y s e a r c h t r e e s ( W e i s s c h a p t e r 4 . 2

BuildHeap & Disjoint sets Todays announcements HW3 due Nov 15, 23:59 PA3 out, Due

Hand to Hand Combat with a Gorilla Wont Work Tony Grout Presentation Title Presentation

Stakeholders in Explainable AI Alun Preece, Dan Harborne (Cardiff), Dave Braines, Richard Tomsett

Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 Luay - PowerPoint PPT Presentation

1 Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 Luay Nakhleh, Rice University 2 Bayes Rule P ( X = x | Y = y ) = P ( X = x, Y = y ) P ( X = x ) P ( Y = y | X = x ) = P ( Y = y ) P x 0 P ( X = x 0 ) P ( Y = y | X = x 0 ) 3 Bayes

Phylogenetics COS551, Fall 2003 Mona Singh Phylogenetics Phylogenetic trees illustrate the

12-11-06 Phylogenetics 1: An overview Phylogenetics 1: An overview Phylogenetic tree used in The

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann

Phylogenetics WHO-TDR Bioinformatics Workshop Jessica Kissinger New Delhi, India October, 2005

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics &amp; big trees 1 Recap of

The phylogenetics of basic word order Gerhard Jger Tbingen University University of

Combinatorics of spaces of trees: an application of topology to phylogenetics Curran N. McConnell

1 Phylogenetics: The biological discipline devoted to reconstructing, gene or genome phylogenies

Principles of Phylogenetics Reading and Inferring Trees Finlay Maguire April 1, 2020 FCS,

Phylogenetics Tutorial 1: 1. Overview 2. Installation 3. Data 4. Multiple Sequence Alignemnt

Analysis of gene copy number changes in tumor phylogenetics Jun Zhou, Yu Lin, Vaibhav Rajan,

Analysis of gene copy number changes in tumor phylogenetics Jijun Tang jtang@cse.sc.edu Tuesday

Hybrid Parallelization of the MrBayes &amp; RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation

EISI Plant-Pollinator Networks 2017 1. Jane S. Huestis Phylogenetics of plant-pollinator

Knowledge Acquisition COMP62342 Sean Bechhofer University of Manchester

Module 9 Media Communications Module Nine: Media Communications 1 Objectives Understand

8. Other Deep Architectures CS 519 Deep Learning, Winter 2018 Fuxin Li With materials from Zsolt

Share Everything Play Fair Dont Hit People Put Things Back Where You Found Them

B i n a r y s e a r c h t r e e s ( W e i s s c h a p t e r 4 . 2

BuildHeap &amp; Disjoint sets Todays announcements HW3 due Nov 15, 23:59 PA3 out, Due

Hand to Hand Combat with a Gorilla Wont Work Tony Grout Presentation Title Presentation

Stakeholders in Explainable AI Alun Preece, Dan Harborne (Cardiff), Dave Braines, Richard Tomsett

Fundamentals of Evolution Session 6 - 2018 Bayesian phylogenetics & big trees 1 Recap of

Hybrid Parallelization of the MrBayes & RAxML Phylogenetics Codes Wayne Pfeiffer (SDSC/UCSD)

BuildHeap & Disjoint sets Todays announcements HW3 due Nov 15, 23:59 PA3 out, Due