assessing phylogenetic hypotheses and phylogenetic data
play

Assessing Phylogenetic Hypotheses and Phylogenetic Data We use - PowerPoint PPT Presentation

Assessing Phylogenetic Hypotheses and Phylogenetic Data We use numerical phylogenetic methods because most data includes potentially misleading evidence of relationships We should not be content with constructing phylogenetic hypotheses


  1. Assessing Phylogenetic Hypotheses and Phylogenetic Data • We use numerical phylogenetic methods because most data includes potentially misleading evidence of relationships • We should not be content with constructing phylogenetic hypotheses but should also assess what ‘confidence’ we can place in our hypotheses • This is not always simple! (but do not despair!)

  2. Assessing Data Quality • We expect (or hope) our data will be well structured and contain strong phylogenetic signal • We can test this using randomization tests of explicit null hypotheses • The behaviour or some measure of the quality of our real data is contrasted with that of comparable but phylogenetically uninformative data determined by randomization of the data

  3. Random Permutation Random permutation destroys any correlation among characters to that expected by chance alone It preserves number of taxa, characters and character states in each character (and the theoretical maximum and minimum tree lengths) ‘TAXA’ ‘CHARACTERS’ Original structured data with 1 2 3 4 5 6 7 8 R-P R P R P R P R P strong correlations among A-E A E A E A E A E N-R N R N R N R N R characters D-M D M D M D M D M O-U O U O U O U O U M-T M T M T M T M T L-E L E L E L E L E Y-D Y D Y D Y D Y D ‘TAXA’ ‘CHARACTERS’ 1 2 3 4 5 6 7 8 Randomly permuted data with R-P N U D E R T O U A-E R E A P L E A D any correlation among N-R M R M M A D N P characters due to chance D-M L T R E Y M D R O-U D E Y U D E Y M M-T O M O T O U L T L-E Y D N D M P M E Y-D A P L R N R R E

  4. Matrix Randomization Tests • Compare some measure of data quality/hierarchical structure for the real and many randomly permuted data sets • This allows us to define a test statistic for the null hypothesis that the real data are no better structured than randomly permuted and phylogenetically uninformative data • A permutation tail probability (PTP) is the proportion of data sets with as good or better measure of quality than the real data

  5. Structure of Randomization Tests • Reject null hypothesis if, for example, more than 5% of random permutations have as good or better measure than the real data FAIL TEST Frequency 95% cutoff PASS TEST reject null hypothesis Measure of data quality (e.g. tree length, ML, pairwise incompatibilities) GOOD BAD

  6. Matrix Randomization Tests • Measures of data quality include: 1. Tree length for most parsimonious trees - the shorter the tree length the better the data (PAUP*) 2. Numbers of pairwise incompatibilities between characters (pairs of incongruent characters) - the fewer character conflicts the better the data 3. Skewness of the distribution of tree lengths (PAUP)

  7. Matrix Randomization Tests Ciliate SSUrDNA Min = 430 Max = 927 1 MPT Ochromonas L = 618 Symbiodinium CI = 0.696 Prorocentrum Loxodes RI = 0.714 Real data Tracheloraphis PTP = 0.01 Spirostomum Gruberia PC-PTP = 0.001 Euplotes Tetrahymena Significantly non random 3 MPTs Ochromonas Symbiodinium L = 792 Prorocentrum CI = 0.543 Loxodes Randomly Tetrahymena RI = 0.272 Tracheloraphis permuted Spirostomum PTP = 0.68 Euplotes PC-PTP = 0.737 Gruberia Not significantly different Strict consensus from random

  8. Skewness of Tree Length Distributions NUMBER OF TREES • Studies with random (and phylogenetically uninformative) shortest tree data showed that the distribution of tree lengths tends to be normal Tree length • In contrast, phylogenetically NUMBER OF TREES informative data is expected to shortest have a strongly skewed tree distribution with few shortest trees and few trees nearly as Tree length short

  9. Skewness of Tree Length Distributions • Skewness of tree length distributions can be used as a measure of data quality in randomization tests • It is measured with the G 1 statistic in PAUP • Significance cut-offs for data sets of up to eight taxa have been published based on randomly generated data (rather than randomly permuted data) • PAUP does not perform the more direct randomization test

Recommend


More recommend