data requirement for species tree from multiple gene trees
play

Data Requirement for Species Tree from Multiple Gene Trees - PowerPoint PPT Presentation

Data Requirement for Species Tree from Multiple Gene Trees (Dasarathy, Nowak, Roch 2015) Daewon Seo Mar. 14. 2017 Introduction Incomplete Lineage Sorting (ILS), . Gene tree topologies could be different from species tree Two


  1. Data Requirement for Species Tree from Multiple Gene Trees (Dasarathy, Nowak, Roch 2015) Daewon Seo Mar. 14. 2017

  2. Introduction • Incomplete Lineage Sorting (ILS), …. • Gene tree topologies could be different from species tree • Two statistically consistent algorithms • Key parameter � : smallest species branch length • Assuming perfect gene trees are given, � GLASS: # of genes ~ O � � STEAC: # of genes ~ O � �

  3. Introduction • Focus on data length( ≜ � ), not number of genes( ≜ � ) • In a single gene tree, to reconstruct topology with high probability, 1 �~O � � • Therefore, in GLASS, 1 ��~O � � • In STEAC, 1 ��~O � �

  4. Introduction • METAL • Modified STEAC algorithm � � for any � � 1  ��~Θ �~O � � � � • While STEAC needs molecular clock assumption, METAL does not

  5. Gene Tree Generation Process • Given an unknown species tree, �� : time of species tree • � • � �� : random time of gene tree branch �� � � � 1 � � �� � � �� � � • Samples ~ JC model in each gene tree Gene1: ((A,B),C) Gene2: (A,(B,C)) - discordant

  6. METAL Concatenation is good!

  7. METAL with molecular clock • Take normalized Hamming distance of concatenated sequence �̂ �� � 1 �� � � �� � , � � � • [Thm 1] ���̂ �� � is ultrametric • [Thm 2] UPGMA works! (other methods as well) • To achieve error less than � , � � � � � � �� and � � 1

  8. METAL with non-molecular clock • ���̂ �� � is no longer ultrametric, so a new metric satisfying four-point condition is needed A C if and only if � �� � � �� � � �� � � �� � � �� � � �� D B • Set � �� � � 3 �1 � 4 4 log 3 ���̂ �� ��

  9. METAL with non-molecular clock • [Thm 3] � �� satisfies the four-point condition � � • Thus, � � �̂ �� � is our corrected distance � �� � � � log �1 � • [Thm 4] We can characterize the error probability of NJ over � �� , • when � is small enough � � � � � � �� and � � 1

  10. Discussion • What is the exact tradeoff of �, � ? • Hypothesis test argument gives � ∈ Ω�� �� � • Steel and Szekely (2002), �� ∈ Ω�� �� � � ��  � � � ����� �� , � � • This paper, � ∈ � � �� , � � 1 • GLASS � ∈ � � �� , � ∈ � � �� ⇒ �� ∈ ��� �� � • What if mutation rate varies over gene trees?

  11. Thank you!

Recommend


More recommend