� N. Salamin c Sept 2007 Lecture outline Maximum likelihood in phylogenetics Definition Phylogenetics and bioinformatics for evolution Maximum likelihood and models Likelihood of a tree Computational complexity Statistical Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design Hypothesis testing Tree support Tests of topology Tests of models September, 2007 � N. Salamin c Sept 2007 Lecture outline Lecture outline Maximum likelihood in phylogenetics Maximum likelihood in phylogenetics 1 Definition Definition Maximum likelihood and models Maximum likelihood and models Likelihood of a tree Computational Likelihood of a tree complexity Statistical Computational complexity properties Maximum parsimony Maximum likelihood Statistical properties 2 Experimental design Maximum parsimony Hypothesis testing Maximum likelihood Tree support Tests of topology Experimental design Tests of models Hypothesis testing 3 Tree support Tests of topology Tests of models
� N. Salamin c Sept 2007 Lecture outline Lecture outline Maximum likelihood in phylogenetics Maximum likelihood in phylogenetics 1 Definition Definition Maximum likelihood and models Maximum likelihood and models Likelihood of a tree Computational Likelihood of a tree complexity Statistical Computational complexity properties Maximum parsimony Maximum likelihood Statistical properties 2 Experimental design Maximum parsimony Hypothesis testing Maximum likelihood Tree support Tests of topology Experimental design Tests of models Hypothesis testing 3 Tree support Tests of topology Tests of models � N. Salamin c Sept 2007 Description Lecture outline Maximum likelihood in phylogenetics Given an hypothesis H and some data D , the likelihood of H is Definition Maximum likelihood and models Likelihood of a tree L ( H ) = Prob ( D | H ) = Prob ( D 1 | H ) Prob ( D 2 | H ) · · · Prob ( D n | H ) Computational complexity Statistical if the D can be split in n independent parts. properties Maximum parsimony Maximum likelihood Note that L ( H ) is not the probability of the hypothesis, but the Experimental design probability of the data, given the hypothesis. Hypothesis testing Tree support Tests of topology Tests of models Maximum likelihood properties (Fisher, 1922) • consistency – converge to correct value of the parameter • efficiency – has the smallest possible variance around true parameter value
� N. Salamin c Sept 2007 Toss a coin Lecture outline Maximum likelihood in phylogenetics Definition Maximum likelihood and models Likelihood of a tree Computational complexity Let say we toss a coin 11 times and obtain 5 heads and 6 tails. Statistical properties All tosses are independent and all have the same unknown Maximum parsimony Maximum likelihood head probability p . What is the probability of this data? Experimental design Hypothesis testing L ( p ) = Prob ( D | p ) = p 5 ( 1 − p ) 6 Tree support Tests of topology Tests of models The maximum likelihood is p = 0 . 454545, which can be found by equating the derivative of L ( p ) with respect to p to zero and solving: dL ( p ) = 5 p 4 ( 1 − p ) 6 − 6 p 5 ( 1 − p ) 5 = 0 dp which yields ˆ p = 5 / 11 = 0 . 454545. � N. Salamin c Sept 2007 Likelihood and models Lecture outline Maximum likelihood in Maximum likelihood rely on explicit probabilistic models of phylogenetics Definition evolution. Maximum likelihood and models Likelihood of a tree But, the process of evolution is so complex and multifaceted that Computational complexity basic models involve assumptions built upon assumptions. Statistical properties Maximum parsimony This reliance is often seen as a weakness of the likelihood Maximum likelihood Experimental design framework, but Hypothesis • the need to make explicit assumptions is a strength testing Tree support • enable both inferences about evolutionary history and Tests of topology Tests of models assessments of the accuracy of the assumptions made • this led to a better understanding of evolution “The purpose of models is not to fit the data, but to sharpen the questions” (S. Karlin)
� N. Salamin c Sept 2007 Basic settings for models of evolution Lecture outline Maximum likelihood in phylogenetics Definition In order to discuss models of DNA evolution, we need to make Maximum likelihood and models some basic assumptions. Some can be relaxed in more complex Likelihood of a tree Computational models. complexity Statistical properties Maximum parsimony Maximum likelihood • the DNA sequences are alignable Experimental design Hypothesis • substitutions through time follow a Poison distribution testing Tree support • the sites of DNA sequence evolved independently Tests of topology Tests of models • all the sites have the same rate of substitutions With these assumptions, we can easily model substitutions with a Markov chain. � N. Salamin c Sept 2007 Markov chain Lecture outline Maximum Let the state (one of A, C, G or T) of the chain be X ( t ) at time t. likelihood in phylogenetics Definition Maximum likelihood The Markov chain is characterized by its generator matrix and models Likelihood of a tree Q = { q ij } , where q ij is the instantaneous rate of change from i to j Computational complexity when ∆ t → 0, that is Statistical properties Pr { X ( t + ∆ t ) = j | X ( t ) = i } = q ij ∆ t Maximum parsimony Maximum likelihood Experimental design Hypothesis testing The diagonal elements q ii are specified by the requirement that Tree support each row of Q sums to zero, that is Tests of topology Tests of models � q ii = − q ij i � = j Thus − q ii is the substitution rate of state i , i.e. the rate at which the Markov chain leaves i .
� N. Salamin c Sept 2007 Transition-probability matrix Lecture outline Maximum likelihood in phylogenetics Definition The Q matrix fully determines the dynamics of the Markov chain. Maximum likelihood and models Likelihood of a tree Computational It specifies, in particular, the transition-probability matrix over any complexity time t > 0, P ( t ) = { p ij ( t ) } where Statistical properties Maximum parsimony Maximum likelihood p ij ( t ) = Pr { X ( t ) = j | X ( 0 ) = i } Experimental design Hypothesis testing Tree support Tests of topology We can further show that Tests of models P ( h ) − I lim h → 0 + = Q h � N. Salamin c Sept 2007 Transition-probability matrix Lecture outline Maximum likelihood in phylogenetics Definition Thus (by the Chapman-Kolmogorov relation) Maximum likelihood and models Likelihood of a tree Computational P ( t + ∆ t ) = P ( t ) P (∆ t ) complexity Statistical P ( t + ∆ t ) − P ( t ) = P ( t )( P (∆ t ) − I ) properties Maximum parsimony P ( t + ∆ t ) − P ( t ) = P ( t ) P (∆ t ) − I Maximum likelihood Experimental design ∆ t ∆ t Hypothesis P ′ ( t ) = P ( t ) Q testing Tree support Tests of topology Tests of models as P ( 0 ) = I , we finally get P ( t ) = e Qt
� N. Salamin c Sept 2007 Rate of mutation Lecture outline Maximum likelihood in phylogenetics Definition Maximum likelihood As Q and t occur only in the form of a product, it is conventional and models Likelihood of a tree to scale Q so that the average rate is 1. Computational complexity Statistical properties In phylogenetics, branch length is therefore measured in expected Maximum parsimony Maximum likelihood substitutions per site. Experimental design Hypothesis testing A long branch can therefore either by due to Tree support Tests of topology • long evolutionary time Tests of models • a rapid rate of substitution • a combination of both � N. Salamin c Sept 2007 Kimura 2-parameters, 1981 Lecture outline Maximum likelihood in phylogenetics Definition Maximum likelihood and models Likelihood of a tree − β/ 4 α/ 4 β/ 4 Computational complexity β/ 4 − β/ 4 α/ 4 Q = Statistical − α/ 4 β/ 4 β/ 4 properties Maximum parsimony β/ 4 α/ 4 β/ 4 − Maximum likelihood Experimental design where Hypothesis testing • α is the transition rate Tree support Tests of topology • β is the transversion rate Tests of models It simplifies to Jukes-Cantor, 1969 if α = β .
Recommend
More recommend