using phylogenetics to estimate species divergence times
play

Using phylogenetics to estimate species divergence times ... More - PDF document

Using phylogenetics to estimate species divergence times ... More accurately ... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures of homologous proteins ... from


  1. Using phylogenetics to estimate species divergence times ... More accurately ... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures of homologous proteins ... from different species is important, therefore, for two reasons. First, the similarities found give a measure of the minimum structure for biological function. Second, the differences found may give us important clues to the rate at which successful mutations have occurred throughout evolutionary time and may also serve as an additional basis for establishing phylogenetic relationships." From p. 143 of The Molecular Basis of Evolution by Dr. Christian B. Anfinsen (Wiley, 1959)

  2. 0.5% 0.5% 10% 5% 20% 4.5% 5% 10% 0.5% 0.5% 10% 5% 20% 4.5% 5% 10% 200 M 200 Million illion Year ear O Old F ld Fossil ossil

  3. 0.5% 0.5% 10% 5% 20% 10 Million 4.5% 20% Sequence 100 Divergence in 200 Mill. 5% Million Years means 1% divergence per 10 Mill. Years 10% 200 Million 200 M illion Year ear Old F O ld Fossil ossil 400 Million The "Clock Idea" “Ernst Mayr recalled at this meeting that there are two distinct aspects to phylogeny: the splitting of lines, and what happens to the lines subsequently by divergence. He emphasized that, after splitting, the resulting lines may evolve at very different rates... How can one then expect a given type of protein to display constant rates of evolutionary modification along different lines of descent?” (Evolving Genes and Proteins. Zuckerkandl and Pauling, 1965, p. 138).

  4. 0.5% 0.5% 10% 5% 20% 10 Million 4.5% A problem with the "Clock Idea": 100 Rates of Molecular 5% Million Evolution Change Over Time !! 10% 200 Million 200 M illion Year ear Old F O ld Fossil ossil 400 Million 0.5% 0.5% 10% 5% 20% 4.5% Another problem with the "Clock Idea": Fossils are 5% I If mammal head f mammal head unlikely to represent is deriv is der ived char ed charac acter er same organism as genetic & f & fossil is 200 M ossil is 200 Mill. ill. Years ears common ancestor. 10% old then bir old then bird-mammal split d-mammal split must ha must have b e been a een at least 200 t least 200 million years old million y ears old. This is a c his is a constr onstrain aint on a diver on a div ergenc ence time e time.

  5. Bayesian Idea: (Prior Information ) X (Information from data) = Posterior Information Basic Idea for Bayesian Divergence Time Inference R: rates T: node times C: Fossil Evidence (constraints) S: Sequence Data P(S,R,T|C) P(S|R,T,C) P(R|T,C) P(T|C) P(R,T|S,C) = = P(S|C) P(S|C) P(S|R,T) P(R|T) P(T|C) = P(S|C)

  6. (Relaxed Clock) Bayesian Divergence Time Components 1. DNA or protein sequence data 2. Model of Sequence Change 3. Model of Rate Change 4. Prior Distributions for Rates, Times, etc. 5. Fossil or other information 5 Branch Length = Rate x Time 4 Rate (the information from 3 molecular sequence data) 2 1 1 2 3 4 5 Time

  7. 5 Prior Distribution 4 Rate 3 2 1 1 2 3 4 5 Time 5 4 Rate 3 2 1 1 2 3 4 5 Time

  8. Posterior with constraints 5 4 Region between Rate green vertical lines 3 are constraints on node time 2 1 1 2 3 4 5 Time Yang-Rannala “Soft” Constraints 5 (dashed green lines treated as imperfect fossil evidence) 4 Rate 3 2 1 1 2 3 4 5 Time

  9. A digression: W hat are w e rea ll y estimating w hen w e estimate “ divergen c e ” times ? H istory of gene c opies in a popu l ation Time “Then” “Now” � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

  10. Maybe “Phylogenetic lineage” Dead � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � “Phylogenetic lineage” Dead Maybe

  11. Spe c ies Divergen c e T ime Divergen c e time of gene c opies � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � “Phylogenetic lineage” Dead Maybe H o w mu c h time does di fg eren c e bet w een gene c opy and spe c ies tree represent ?

  12. For a c oa l es c ent pro c ess H o w mu c h time w ith dip l oid organisms, does di fg eren c e average time di fg eren c e bet w een gene is 2N generations and e c opy and spe c ies standard deviation is a l so tree represent ? 2N generations ... e ( N is e fg e c tive popu l ation si z e) W hen time needed e for 2N generations is e l arge re l ative to spe c ies divergen c e times, be c arefu l ... and try * BE A S T or BES T soft w are ? See: H e l ed & Drummond. 2012 . MBE 27 : 570-580 L iu. 2008 . Bioinformati c s 24 : 2542-2543 . Re c ombination is another divergen c e time (and phy l ogeneti c ) c ha ll enge ! Recombination GMRCA (Grand Most Recombination Recent Common Ancestor) Time

  13. End of digression on ... W hat are w e rea ll y estimating w hen w e estimate “ divergen c e ” times ? Bayesian Divergence Time Components 1. DNA or protein sequence data Sequence data is needed for branch length (rate x time) estimation. Sequence data does not separate rates and times. Better to invest in improving other time estimation components?

  14. Bayesian Divergence Time Components 2. Model of Sequence Change Branch Length (BL) Errors Divergence Time Errors Posterior distributions for times are compromise between branch length information from sequence data and prior information and fossil information. 5 Branch length estimation error can a f ect divergence time estimates ... 4 3 Rate 2 1 0 0 1 2 3 4 5 Time

  15. Bayesian Divergence Time Components 2. Model of Sequence Change Branch Length (BL) Errors Divergence Errors in BL uncertainty Time Errors Posterior distributions for times are compromise between branch length information from sequence data and prior information and fossil information. 5 Red line represents “ best ” branch length estimate. H o w good are yello w and green estimates ? 4 Point: Rate and time estimates are a compromise bet w een branch length 3 Rate uncertainty and prior information... Errors in assessing branch length 2 uncertainty could have big e f ect on divergence time inferences ... 1 0 0 1 2 3 4 5 Time

  16. Errors in BL uncertainty have more serious consequences for divergence time estimation than for phylogeny inference. Sources of these errors include failure to account for dependent change among sequence positions. Context-Dependent Mutation Codons Protein Tertiary Structure RNA Secondary Structure Other Genotype-Phenotype Connections Bayesian Divergence Time Components 3. Model of Rate Change How much of what appears to be rate change really is rate change? see Cutler, D.J. (2000) Estimating divergence times in the presence of an overdispersed molecular clock. Mol. Biol. Evol. 17:1647-1660.

  17. A point made well by Cutler (2000) ...Rejection of constant rate hypothesis may not be due to variation of rates over time as much as being due to poor models of sequence evolution that may mislead us about how confident we can be regarding branch length estimates ... (my viewpoint... "first principles" of evolutionary biology mean constant rate hypothesis must be formally wrong even though it may sometimes be nearly right) Why might rates of molecular evolution change over time? Candidates include changes in ... mutation rate per generation generation time natural selection (including effects due to duplication) population size (higher rates for small pop. size)

Recommend


More recommend