Phylogenetic Support 1. Introduction 2. Evolutionary Model Testing - PowerPoint PPT Presentation

• Decision Theory ( DT ) risk minimisation approach. parameters: • However, this penalises all high K models even if sample size is large too. 16 Information Criterion • Akaike Information Criterion ( AIC ), penalising number of • AIC = − 2 ln ( L ) + 2 K • Corrected Akaike Information Criterion ( AICc ) • AICc = AIC + 2 K ( K + 1 ) n − K − 1 • Alternatively, there is the Bayesian Information Criterion ( BIC ): • BIC = − 2 ln ( L ) + Kln ( n )

parameters: • However, this penalises all high K models even if sample size is large too. 16 Information Criterion • Akaike Information Criterion ( AIC ), penalising number of • AIC = − 2 ln ( L ) + 2 K • Corrected Akaike Information Criterion ( AICc ) • AICc = AIC + 2 K ( K + 1 ) n − K − 1 • Alternatively, there is the Bayesian Information Criterion ( BIC ): • BIC = − 2 ln ( L ) + Kln ( n ) • Decision Theory ( DT ) risk minimisation approach.

• What if everything fits poorly? • Information criterion test relative goodness of fit instead of absolute • Parametric Bootstrapping/Posterior Predictive Simulation • If the model is reasonable then data simulated under should resemble the empirical data 17 Limitations

Branch Support Testing

18 Bootstrapping in General The bootstrap (unknown) true value of estimate of empirical distribution of sample (unknown) true distribution Bootstrap replicates Distribution of estimates of parameters Slide from Joe Felsenstein

19 Bootstrapping Phylogenies The bootstrap for phylogenies sites Original Data ^ T sample same number T (1) Bootstrap sample of sites, with replacement #1 Bootstrap sample same number T (2) sample of sites, with replacement #2 (and so on) Slide from Joe Felsenstein

20 Bootstrapping Phylogenies

22 Combining the results

• Randomly reweighing the sites in an alignments • Probability of a site being excluded 1 1 n n • Asymptotically approximately 0 36 • Goal to simulate an infinite population (number of alignment columns) 23 What is the bootstrap doing?

• Randomly reweighing the sites in an alignments n n • Asymptotically approximately 0 36 • Goal to simulate an infinite population (number of alignment columns) 23 What is the bootstrap doing? • Probability of a site being excluded 1 − 1

• Randomly reweighing the sites in an alignments n n • Goal to simulate an infinite population (number of alignment columns) 23 What is the bootstrap doing? • Probability of a site being excluded 1 − 1 • Asymptotically approximately 0 . 36

• Typically underestimates the true probabilities • i.e biased but conservative • Computationally demanding • Assumes independence of sites • Relies on good input data • Only answers to what extent does input data support a given part of the tree 24 Limitations

• Simulate data sets of this size assuming the estimate of the tree is the truth • Key for many more sophisticated tests. • Can be used to generate p -values, but non-trivial 25 Parametric Bootstraps

• Rapid Bootstraps ( RBS ) • Ultrafast Bootstraps ( UFBoot ) • Instead of re-doing the full ML inference just re-sample the site ln L values and sum 26 Alternative Approaches • Resampling estimated log-likelihoods ( RELL )

• Rapid Bootstraps ( RBS ) • Ultrafast Bootstraps ( UFBoot ) • Instead of re-doing the full ML inference just re-sample the site 26 Alternative Approaches • Resampling estimated log-likelihoods ( RELL ) ln ( L ) values and sum

• Ultrafast Bootstraps ( UFBoot ) • Instead of re-doing the full ML inference just re-sample the site 26 Alternative Approaches • Resampling estimated log-likelihoods ( RELL ) ln ( L ) values and sum • Rapid Bootstraps ( RBS )

26 • Instead of re-doing the full ML inference just re-sample the site Alternative Approaches • Resampling estimated log-likelihoods ( RELL ) ln ( L ) values and sum • Rapid Bootstraps ( RBS ) • Ultrafast Bootstraps ( UFBoot )

• Parametric aLRT : 2 of • Non-parametric SH-aLRT • aBayes : P X T c P T c 0 P X T i P T i with 27 P T 2 P T 1 P T 0 flat prior i • P T c 2 X • Comparing the 3 nearest NNIs based on RELL branch vs. closest NNIs for to a given branch: Likelihood Tests

• Non-parametric SH-aLRT • aBayes : P X T c P T c 0 P X T i P T i with 27 P T 2 P T 1 P T 0 flat prior i X 2 • Comparing the 3 nearest NNIs • P T c based on RELL branch vs. closest NNIs to a given branch: Likelihood Tests • Parametric aLRT : χ 2 of δ for

• aBayes : P X T c P T c 0 P X T i P T i with 27 P T 2 P T 1 P T 0 flat prior i 2 X • Comparing the 3 nearest NNIs • P T c based on RELL branch vs. closest NNIs to a given branch: Likelihood Tests • Parametric aLRT : χ 2 of δ for • Non-parametric SH-aLRT

P X T c P T c 0 P X T i P T i with 27 P T 2 P T 1 P T 0 flat prior i 2 X • Comparing the 3 nearest NNIs • P T c based on RELL branch vs. closest NNIs to a given branch: Likelihood Tests • Parametric aLRT : χ 2 of δ for • Non-parametric SH-aLRT • aBayes :

• Comparing the 3 nearest NNIs to a given branch: branch vs. closest NNIs based on RELL flat prior 27 Likelihood Tests • Parametric aLRT : χ 2 of δ for • Non-parametric SH-aLRT • aBayes : P ( X | T c ) P ( T c ) • P ( T c | X ) = ∑ 2 i = 0 P ( X || T i ) P ( T i ) with P ( T 0 ) = P ( T 1 ) = P ( T 2 )

Comparing Trees

28 https://github.com/mtholder/TreeTopoTestingTalks How to compare competing hypotheses?

29 https://github.com/mtholder/TreeTopoTestingTalks How to compare competing hypotheses?

30 Simplistic Comparison

k p k 1 p n • 4 sites favour the red tree, 2 favour the blue • n k • 4 out of 6 p 0 6875 • 40 out of 60 p 0 0124 • 400 out of 600 p 2 3 10 16 31 Qualitative Comparison

• 4 sites favour the red tree, 2 favour the blue • k • 4 out of 6 p 0 6875 • 40 out of 60 p 0 0124 • 400 out of 600 p 2 3 10 16 31 Qualitative Comparison ( n ) p k ( 1 − p ) n − k

• 4 sites favour the red tree, 2 favour the blue • k • 40 out of 60 p 0 0124 • 400 out of 600 p 2 3 10 16 31 Qualitative Comparison ( n ) p k ( 1 − p ) n − k • 4 out of 6 p = 0 . 6875

• 4 sites favour the red tree, 2 favour the blue • k • 400 out of 600 p 2 3 10 16 31 Qualitative Comparison ( n ) p k ( 1 − p ) n − k • 4 out of 6 p = 0 . 6875 • 40 out of 60 p = 0 . 0124

• 4 sites favour the red tree, 2 favour the blue • k 31 Qualitative Comparison ( n ) p k ( 1 − p ) n − k • 4 out of 6 p = 0 . 6875 • 40 out of 60 p = 0 . 0124 • 400 out of 600 p = 2 . 3 ∗ 10 − 16

• 2 15 22 • t 2 N 0 148 • therefore: p 0 888 under 5 d f 32 Quantiative Comparison • µ = ( − 5 . 2 + 3 . 1 + 0 . 9 + 6 . 6 + 0 . 3 − 0 . 2 ) / 6 = 0 . 916

• t 2 N 0 148 • therefore: p 0 888 under 5 d f 32 Quantiative Comparison • µ = ( − 5 . 2 + 3 . 1 + 0 . 9 + 6 . 6 + 0 . 3 − 0 . 2 ) / 6 = 0 . 916 • σ 2 = 15 . 22

0 888 under 5 d f • therefore: p 32 Quantiative Comparison • µ = ( − 5 . 2 + 3 . 1 + 0 . 9 + 6 . 6 + 0 . 3 − 0 . 2 ) / 6 = 0 . 916 • σ 2 = 15 . 22 √ • t = µ σ 2 ∗ N = 0 . 148

32 Quantiative Comparison • µ = ( − 5 . 2 + 3 . 1 + 0 . 9 + 6 . 6 + 0 . 3 − 0 . 2 ) / 6 = 0 . 916 • σ 2 = 15 . 22 √ • t = µ σ 2 ∗ N = 0 . 148 • therefore: p = 0 . 888 under 5 d . f .

T 1 T 2 T 1 T 2 2 to get a critical value for ? 33 • Expectation under null • Tree space is difficult. • Why can’t we just use 0 X X ln L T 2 X 2 ln L T 1 X • the data equally well. More robust approaches • Null: if no sampling error (infinite data) T 1 and T 2 would explain

T 1 T 2 2 to get a critical value for ? the data equally well. • Expectation under null X 0 • Why can’t we just use • Tree space is difficult. 33 More robust approaches • Null: if no sampling error (infinite data) T 1 and T 2 would explain • δ ( T 1 , T 2 | X ) = 2 [ ln L ( T 1 | X ) − ln L ( T 2 | X )]

2 to get a critical value for ? the data equally well. • Why can’t we just use • Tree space is difficult. 33 More robust approaches • Null: if no sampling error (infinite data) T 1 and T 2 would explain • δ ( T 1 , T 2 | X ) = 2 [ ln L ( T 1 | X ) − ln L ( T 2 | X )] • Expectation under null E [ δ ( T 1 , T 2 | X )] = 0

• Tree space is difficult. the data equally well. 33 More robust approaches • Null: if no sampling error (infinite data) T 1 and T 2 would explain • δ ( T 1 , T 2 | X ) = 2 [ ln L ( T 1 | X ) − ln L ( T 2 | X )] • Expectation under null E [ δ ( T 1 , T 2 | X )] = 0 • Why can’t we just use χ 2 to get a critical value for δ ?

• Many avenues: • Non-parametric bootstrapping • Parametric bootstrapping • Related approaches. 34 Estimating variance of the null

T 1 T 2 T 1 T 2 T 1 T 2 0 • Can’t handle multiple comparisons. selection bias • Due to centring assumption can’t be used for optimal tree i.e. two-tail t -test • Test • Non-parametric Bootstrap to estimate Null variance 35 • First, winning sites test • H a 0 X ln L T 2 X ln L T 1 • H 0 Kishino-Hasegawa Test

T 1 T 2 T 1 T 2 • First, winning sites test • H a 0 • Non-parametric Bootstrap to estimate Null variance • Test two-tail t -test • Due to centring assumption can’t be used for optimal tree i.e. selection bias • Can’t handle multiple comparisons. 35 Kishino-Hasegawa Test • H 0 : [ ln L ( T 1 | X ) − ln L ( T 2 | X )] = E [ δ ( T 1 , T 2 )] = 0

T 1 T 2 • First, winning sites test • Non-parametric Bootstrap to estimate Null variance • Test two-tail t -test • Due to centring assumption can’t be used for optimal tree i.e. selection bias • Can’t handle multiple comparisons. 35 Kishino-Hasegawa Test • H 0 : [ ln L ( T 1 | X ) − ln L ( T 2 | X )] = E [ δ ( T 1 , T 2 )] = 0 • H a : E [ δ ( T 1 , T 2 )] ̸ = 0

• First, winning sites test • Non-parametric Bootstrap to estimate Null variance • Due to centring assumption can’t be used for optimal tree i.e. selection bias • Can’t handle multiple comparisons. 35 Kishino-Hasegawa Test • H 0 : [ ln L ( T 1 | X ) − ln L ( T 2 | X )] = E [ δ ( T 1 , T 2 )] = 0 • H a : E [ δ ( T 1 , T 2 )] ̸ = 0 • Test E [ δ ( T 1 , T 2 )] two-tail t -test

• Approximately Unbiased Test • Swofford–Olsen–Waddell–Hillis same idea but uses parametric • Compares candidate tree sets • H 0 all topologies equally good • Very conservative when the number of candidate trees is large • Can be corrected with weighted SH-test overcomes. • Achieves weighted by varying bootstrap size for each tree. • Better for larger comparisons, can have issues with P-space curvature. bootstraps instead. • Sensitive to model misspecification. 36 Alternative tests • Shimodaira-Hasegawa Test

• Approximately Unbiased Test • Swofford–Olsen–Waddell–Hillis same idea but uses parametric • Compares candidate tree sets • Very conservative when the number of candidate trees is large • Can be corrected with weighted SH-test overcomes. • Achieves weighted by varying bootstrap size for each tree. • Better for larger comparisons, can have issues with P-space curvature. bootstraps instead. • Sensitive to model misspecification. 36 Alternative tests • Shimodaira-Hasegawa Test • H 0 = all topologies equally good

• Swofford–Olsen–Waddell–Hillis same idea but uses parametric • Compares candidate tree sets • Very conservative when the number of candidate trees is large • Can be corrected with weighted SH-test overcomes. • Achieves weighted by varying bootstrap size for each tree. • Better for larger comparisons, can have issues with P-space curvature. bootstraps instead. • Sensitive to model misspecification. 36 Alternative tests • Shimodaira-Hasegawa Test • H 0 = all topologies equally good • Approximately Unbiased Test

Phylogenetic Support 1. Introduction 2. Evolutionary Model Testing - PowerPoint PPT Presentation

Finlay Maguire Statistical Testing of Trees March 27, 2018 FCS, Dalhousie Phylogenetic Support 1. Introduction 2. Evolutionary Model Testing 3. Branch Support Testing 4. Comparing Trees 5. Conclusion 1 Table of contents Introduction 2

Assessing Phylogenetic Hypotheses and Phylogenetic Data We use numerical phylogenetic methods

FOUND IN TRANSLATION: Reconstructing Phylogenetic Language Trees Reconstructing Phylogenetic

Phylogenetic Networks Networks Phylogenetic Daniel H. Huson Daniel H. Huson www-

Spaces of phylogenetic networks Jonathan Klawitter PhD Exam 5th March, 2020 2 - 1

CSCE 471/871 Lecture 5: Phylogenetic Trees Building Phylogenetic Trees Stephen Scott

Outline CSCE CSCE 471/871 471/871 Lecture 5: Lecture 5: Building Building CSCE 471/871

Phylogenetic analysis of Cytochrome P450 Phylogenetic analysis of Cytochrome P450 Structures

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

Drawing Tree-Based Phylogenetic Networks with Minimum Number of Crossings Jonathan Klawitter

Phylogenetic Trees in ACL2 Warren A. Hunt Jr. and Serita M. Nelesen The University of Texas at

On the proper use of phylogenetic information in typology Gerhard Jger Tbingen University

Balance indices for phylogenetic trees under well-known probability models Universitat de les

CSCE 471/871 Lecture 5: Building Phylogenetic Trees Building trees from pairwise distances

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

False Layers Delmarva Variant Strain Phylogenetic Tree Cloacal/Pharyngal One of these 50 week

Small Phylogenetic Trees M. Casanellas, M. Contois, L. D. Garcia, S. Hosten, Y. Kim, D. Levy, S.

Data Driven Marketing the DNA of customer oriented companies 00101001 yes no Data Driven

The California 2012 ELD Standards: Building Capacity and Internal Accountability for ELD

Hajo Zeeb Leibniz Institute for Prevention Research and Epidemiology BIPS, Bremen, Germany

Me a sure s o f Va ria b ility L E CT URE 4 Ob je c tive s De fine te rms. Dia g ra

System-Level Market Power Mitigation A Conceptual Design Proposal Stakeholder Working Group

Review of State Employee Total Compensation General Government / Technology Sub-Committee ,

Miami-Dade MPO & FTAC presented by Cambridge Systematics, Inc. Michael Williamson September

NORWEP Webinar on the Covid19 situation in Hydropower markets Eddie Rich Chief Executive

Sambuz

Useful Links

Newsletter

Mail Us

Phylogenetic Support 1. Introduction 2. Evolutionary Model Testing - PowerPoint PPT Presentation

Finlay Maguire Statistical Testing of Trees March 27, 2018 FCS, Dalhousie Phylogenetic Support 1. Introduction 2. Evolutionary Model Testing 3. Branch Support Testing 4. Comparing Trees 5. Conclusion 1 Table of contents Introduction 2

Assessing Phylogenetic Hypotheses and Phylogenetic Data We use numerical phylogenetic methods

FOUND IN TRANSLATION: Reconstructing Phylogenetic Language Trees Reconstructing Phylogenetic

Phylogenetic Networks Networks Phylogenetic Daniel H. Huson Daniel H. Huson www-

Spaces of phylogenetic networks Jonathan Klawitter PhD Exam 5th March, 2020 2 - 1

CSCE 471/871 Lecture 5: Phylogenetic Trees Building Phylogenetic Trees Stephen Scott

Outline CSCE CSCE 471/871 471/871 Lecture 5: Lecture 5: Building Building CSCE 471/871

Phylogenetic analysis of Cytochrome P450 Phylogenetic analysis of Cytochrome P450 Structures

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

Drawing Tree-Based Phylogenetic Networks with Minimum Number of Crossings Jonathan Klawitter

Phylogenetic Trees in ACL2 Warren A. Hunt Jr. and Serita M. Nelesen The University of Texas at

On the proper use of phylogenetic information in typology Gerhard Jger Tbingen University

Balance indices for phylogenetic trees under well-known probability models Universitat de les

CSCE 471/871 Lecture 5: Building Phylogenetic Trees Building trees from pairwise distances

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

False Layers Delmarva Variant Strain Phylogenetic Tree Cloacal/Pharyngal One of these 50 week

Small Phylogenetic Trees M. Casanellas, M. Contois, L. D. Garcia, S. Hosten, Y. Kim, D. Levy, S.

Data Driven Marketing the DNA of customer oriented companies 00101001 yes no Data Driven

The California 2012 ELD Standards: Building Capacity and Internal Accountability for ELD

Hajo Zeeb Leibniz Institute for Prevention Research and Epidemiology BIPS, Bremen, Germany

Me a sure s o f Va ria b ility L E CT URE 4 Ob je c tive s De fine te rms. Dia g ra

System-Level Market Power Mitigation A Conceptual Design Proposal Stakeholder Working Group

Review of State Employee Total Compensation General Government / Technology Sub-Committee ,

Miami-Dade MPO &amp; FTAC presented by Cambridge Systematics, Inc. Michael Williamson September

NORWEP Webinar on the Covid19 situation in Hydropower markets Eddie Rich Chief Executive

Sambuz

Useful Links

Newsletter

Mail Us

Miami-Dade MPO & FTAC presented by Cambridge Systematics, Inc. Michael Williamson September