Phylogenetic trees IV Maximum Likelihood Gerhard Jäger Words, Bones, Genes, Tools February 28, 2018 Gerhard Jäger Maximum Likelihood WBGT 1 / 20
Theory Theory Gerhard Jäger Maximum Likelihood WBGT 2 / 20
Theory Recap: Continuous time Markov model WBGT Maximum Likelihood Gerhard Jäger 3 / 20 l 4 l 3 l 2 l 5 � s + re − t r − re − t � l 8 P ( t ) = s − se − t r + se − t l 1 π = ( s, r ) l 6 l 7
Theory Likelihood of a tree WBGT Maximum Likelihood Gerhard Jäger 4 / 20 difgerent branches is independent suppose we know probability simplifying assumption: evolution at background reading: Ewens and Grant (2005), 15.7 l 4 l 3 l 2 l 5 l 8 l 1 distributions v t and v b over states at top and bottom of branch l k l 6 L ( l k ) = v T t P ( l k ) v b l 7
Theory Likelihood of a tree WBGT Maximum Likelihood Gerhard Jäger method from tips to root log-likelihood of larger tree: recursively apply this 5 / 20 log-likelihoods likelihoods of states (0 , 1) at root are v T 1 P ( l 1 ) v T 2 P ( l 2 ) l 2 log ( v T 1 P ( l 1 )) + log ( v T 2 P ( l 2 )) v 2 l 1 v 1
Theory Likelihood of a tree Gerhard Jäger Maximum Likelihood WBGT 6 / 20 L ( mother ) i � � = ( P ( t ) i,j L ( d ) j ) , d ∈ daughters 1 ≤ j ≤ n
Theory (Log-)Likelihood of a tree WBGT Maximum Likelihood Gerhard Jäger likelihoods for each character this is for one character — likelhood for all data is product of if we assume that root node is in equilibrium: root overall likelihood for entire tree depends on probability distribution on each branch this is essentially identical to Sankofg algorithm for parsimony: 7 / 20 weight ( i, j ) = log P ( l k ) ij weight matrix depends on branch length → needs to be recomputed for L ( tree ) = ( s, r ) T L ( root ) does not depend on location of the root ( → time reversibility)
Theory (Log-)Likelihood of a tree likelihood of tree depends on branch lengths rates for each character likelihood for tree topology: Gerhard Jäger Maximum Likelihood WBGT 8 / 20 L ( tree | � L ( topology ) = max l k ) l k : k is a branch
Theory (Log-)Likelihood of a tree WBGT Maximum Likelihood Gerhard Jäger rates are gamma distributed 4 invariant 3 characters) 2 1 difgerent options, increasing order of complexity Where do we get the rates from? 9 / 20 s = r = 0 . 5 for all characters r = empirical relative frequency of state 1 in the data (identical for all a certain proportion p inv (value to be estimated) of characters are
Theory rate matrix is multiplied with WBGT Maximum Likelihood Gerhard Jäger Gamma distribution Gamma-distributed rates 10 / 20 all characters equilibrium distribution is identical for except for mathematical convenience) common method (no real justifjcation much we want allow rates to vary, but not too coeffjcient λ i for character i λ i is random variable drawn from a L ( r i = x ) = β β x ( β − 1) e − βx Γ( β )
Theory Gamma-distributed rates overall likelihood of tree topology: integrate computationally impractical approximate integration via Hidden Markov Model Gerhard Jäger Maximum Likelihood WBGT 11 / 20 over all λ i , weighted by Gamma likelihood in practice: split Gamma distribution into n discrete bins (usually n = 4 ) and
Theory Modeling decisions to make WBGT Maximum Likelihood Gerhard Jäger This could be continued — you can build in rate variation across branches, you can fjt the 1 0 none invariant characters 1 Gamma distributed 0 none rate variation 1 ML estimate 1 aspect of model possible choices number of parameters to estimate branch lengths unconstrained 12 / 20 ultrametric equilibrium probabilities uniform 0 empirical 2 n − 3 ( n is number of taxa) n − 1 p inv number of Gamma categories . . .
Theory Model selection tradeofg rich models are better at detecting patterns in the data, but are prone to over-fjtting parsimoneous models less vulnerable to overfjtting but may miss important information standard issue in statistical inference one possible heuristics: Akaike Information Criterion (AIC) the model minimizing AIC is to be preferred Gerhard Jäger Maximum Likelihood WBGT 13 / 20 AIC = − 2 × log likelihood + 2 × number of free parameters
Theory unconstrained 16 unconstrained uniform Gamma 17496.73 17 empirical none none none 16106.52 18 unconstrained empirical 17494.73 Gamma 16049.28 none Example: Model selection for cognacy data/ 16009.90 13 unconstrained uniform none 17492.73 uniform 14 unconstrained uniform none 17494.73 15 unconstrained none 19 ultrametric 16025.99 16051.27 23 unconstrained ML Gamma none 24 ML unconstrained ML Gamma 16001.00 Gerhard Jäger Maximum Likelihood WBGT none unconstrained unconstrained empirical empirical Gamma none 16033.21 20 unconstrained Gamma 22 16011.38 21 unconstrained ML none none 16102.04 ML Gamma 12 ultrametric ultrametric uniform Gamma none 17517.89 4 uniform 17518.39 Gamma 17519.75 5 ultrametric empirical none none 3 none 6 AIC UPGMA tree model no. branch lengths eq. probs. rate variation inv. char. 1 uniform ultrametric uniform none none 17515.95 2 ultrametric 15981.94 16114.66 ultrametric ultrametric empirical ML none none 16034.96 10 ML 16022.21 none 16058.83 11 ultrametric ML Gamma none 9 ultrametric 14 / 20 empirical empirical ultrametric 8 none 16056.85 7 15997.16 ultrametric none Gamma Gamma p inv p inv p inv p inv p inv p inv p inv p inv p inv p inv p inv p inv
Theory helps WBGT Maximum Likelihood Gerhard Jäger in practice one has to make compromises model specifjcation, and pick the tree+model with lowest AIC ideally, one would want to do 24 heuristic tree searches, one for each model requires several hours on a single processor; parallelization Tree search for the 25 taxa in our running example, ML tree search for the full computationally very demanding! optimize branch lengths to maximize likelihood for that topology heuristic search to fjnd the topology maximizing likelihood ML tree: a model ML computation gives us likelihood of a tree topology, given data and 15 / 20
Running example Running example Gerhard Jäger Maximum Likelihood WBGT 16 / 20
Running example ultrametric: WBGT Maximum Likelihood Gerhard Jäger Running example: cognacy data AIC = 7972 17 / 20 unconstrained branch lengths: AIC = 7929 Greek Irish Breton Welsh Bengali Hindi Nepali Lithuanian Bulgarian Czech Polish Russian Ukrainian Icelandic Swedish Danish English Dutch German Romanian French Italian Catalan Portuguese Spanish Greek Hindi Bengali Nepali Italian French Catalan Romanian Portuguese Spanish Irish Breton Welsh Lithuanian Russian Ukrainian Polish Bulgarian Czech English Dutch German Danish Icelandic Swedish
Running example ultrametric: WBGT Maximum Likelihood Gerhard Jäger Running example: WALS data AIC = 2828 18 / 20 unconstrained branch lengths: AIC = 2752 Bengali Nepali Hindi Breton Irish Welsh Bulgarian Greek Czech Lithuanian Polish Russian Ukrainian Catalan Italian Portuguese Romanian Spanish French Danish Swedish Icelandic Dutch German English Hindi Bengali Nepali Polish Czech Lithuanian Russian Ukrainian English Dutch German Swedish Danish Icelandic Bulgarian Greek Romanian Portuguese Spanish Catalan Italian French Breton Irish Welsh
Running example ultrametric: WBGT Maximum Likelihood Gerhard Jäger Running example: phonetic data AIC = 90575 19 / 20 unconstrained branch lengths: AIC = 89871 Bengali Hindi Nepali Lithuanian Bulgarian Polish Czech Russian Ukrainian English Dutch German Danish Icelandic Swedish Greek Irish Breton Welsh French Catalan Portuguese Romanian Spanish Italian Lithuanian Ukrainian Bulgarian Russian Polish Czech Icelandic Swedish Danish English Dutch German Bengali Hindi Nepali Greek Irish Breton Welsh Romanian French Italian Spanish Catalan Portuguese
Running example Wrapping up WBGT Maximum Likelihood Gerhard Jäger ultrametric constraint makes branch lengths optimization even though they have higher AIC) (note that the ultrametric trees in our example are sometimes better many parameter settings makes model selection diffjcult computationally demanding disadvantages: character states at each internal node can be read ofg side efgect of likelihood computation: probability distribution over on branch lengths possibility of multiple mutations are taken into account — depending data difgerent mutation rates for difgerent characters are inferred from the ML is conceptually superior to MP (let alone distance methods) 20 / 20 computationally more expensive ⇒ not feasible for larger data sets
Running example Ewens, W. and G. Grant (2005). Statistical Methods in Bioinformatics: An Introduction . Springer, New York. Gerhard Jäger Maximum Likelihood WBGT 20 / 20
Recommend
More recommend