Data Requirement for Species Tree from Multiple Gene Trees (Dasarathy, Nowak, Roch 2015) Daewon Seo Mar. 14. 2017
Introduction • Incomplete Lineage Sorting (ILS), …. • Gene tree topologies could be different from species tree • Two statistically consistent algorithms • Key parameter � : smallest species branch length • Assuming perfect gene trees are given, � GLASS: # of genes ~ O � � STEAC: # of genes ~ O � �
Introduction • Focus on data length( ≜ � ), not number of genes( ≜ � ) • In a single gene tree, to reconstruct topology with high probability, 1 �~O � � • Therefore, in GLASS, 1 ��~O � � • In STEAC, 1 ��~O � �
Introduction • METAL • Modified STEAC algorithm � � for any � � 1 ��~Θ �~O � � � � • While STEAC needs molecular clock assumption, METAL does not
Gene Tree Generation Process • Given an unknown species tree, �� : time of species tree • � • � �� : random time of gene tree branch �� � � � 1 � � �� � � �� � � • Samples ~ JC model in each gene tree Gene1: ((A,B),C) Gene2: (A,(B,C)) - discordant
METAL Concatenation is good!
METAL with molecular clock • Take normalized Hamming distance of concatenated sequence �̂ �� � 1 �� � � �� � , � � � • [Thm 1] ���̂ �� � is ultrametric • [Thm 2] UPGMA works! (other methods as well) • To achieve error less than � , � � � � � � �� and � � 1
METAL with non-molecular clock • ���̂ �� � is no longer ultrametric, so a new metric satisfying four-point condition is needed A C if and only if � �� � � �� � � �� � � �� � � �� � � �� D B • Set � �� � � 3 �1 � 4 4 log 3 ���̂ �� ��
METAL with non-molecular clock • [Thm 3] � �� satisfies the four-point condition � � • Thus, � � �̂ �� � is our corrected distance � �� � � � log �1 � • [Thm 4] We can characterize the error probability of NJ over � �� , • when � is small enough � � � � � � �� and � � 1
Discussion • What is the exact tradeoff of �, � ? • Hypothesis test argument gives � ∈ Ω�� �� � • Steel and Szekely (2002), �� ∈ Ω�� �� � � �� � � � ����� �� , � � • This paper, � ∈ � � �� , � � 1 • GLASS � ∈ � � �� , � ∈ � � �� ⇒ �� ∈ ��� �� � • What if mutation rate varies over gene trees?
Thank you!
Recommend
More recommend