� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� ���� �� �� � � � � � � �� �� � � � � � � � � � � � � � � � � � � � � Degrees ◮ G NNI ( n ) is regular with degree 2( n − 3); (Robinson 1971) ◮ G SPR ( n ) is regular with degree 2( n − 3)(2 n − 7); (Allen&Steel 2001) ◮ G TBR ( n ) is not regular, the maximal degree is obtained by caterpillar trees. (Humphries, 2008) T. Wu Evolutionary Analysis
Degrees ◮ G NNI ( n ) is regular with degree 2( n − 3); (Robinson 1971) ◮ G SPR ( n ) is regular with degree 2( n − 3)(2 n − 7); (Allen&Steel 2001) ◮ G TBR ( n ) is not regular, the maximal degree is obtained by caterpillar trees. (Humphries, 2008) � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ���� ���� �� �� � � � � � � �� �� � � � � � � � � � � � � � � � � � � � � Figure: A caterpillar tree T. Wu Evolutionary Analysis
Our result Theorem (Humphries-W, TCBB 2013) For each vertex T ∈ T ∗ n with n ≥ 3 , its degree in G TBR ( n ) is 4Γ( T ) − (8 n 2 − 18 n + 6) T. Wu Evolutionary Analysis
Our result Theorem (Humphries-W, TCBB 2013) For each vertex T ∈ T ∗ n with n ≥ 3 , its degree in G TBR ( n ) is 4Γ( T ) − (8 n 2 − 18 n + 6) with � Γ( T ) := dist T ( u , v ) { u , v }⊆ L ( T ) denoting the sume of the distance between all leaves of T. T. Wu Evolutionary Analysis
Our result Theorem (Humphries-W, TCBB 2013) For each vertex T ∈ T ∗ n with n ≥ 3 , its degree in G TBR ( n ) is 4Γ( T ) − (8 n 2 − 18 n + 6) with � Γ( T ) := dist T ( u , v ) { u , v }⊆ L ( T ) denoting the sume of the distance between all leaves of T. For the vertices in G TBR ( n ): ◮ Maximal degree: Caterpillar Trees ◮ Minimal degree: Semi-regular Trees (see, also, [Szekely-Wang-W, DM 2011]) T. Wu Evolutionary Analysis
A key lemma Lemma For two “distinct” TBR operations θ and θ ′ , θ ( T ) = θ ′ ( T ) implies that both θ and θ ′ are NNI operations. T. Wu Evolutionary Analysis
A key lemma Lemma For two “distinct” TBR operations θ and θ ′ , θ ( T ) = θ ′ ( T ) implies that both θ and θ ′ are NNI operations. Note: Here two TBR operations are distinct if T. Wu Evolutionary Analysis
A key lemma Lemma For two “distinct” TBR operations θ and θ ′ , θ ( T ) = θ ′ ( T ) implies that both θ and θ ′ are NNI operations. Note: Here two TBR operations are distinct if ◮ they delete different edges in the bisection step, or T. Wu Evolutionary Analysis
A key lemma Lemma For two “distinct” TBR operations θ and θ ′ , θ ( T ) = θ ′ ( T ) implies that both θ and θ ′ are NNI operations. Note: Here two TBR operations are distinct if ◮ they delete different edges in the bisection step, or ◮ they use different edges in the reconnection step. T. Wu Evolutionary Analysis
The PDA model ◮ The number of trees in T n is ϕ ( n ) := (2 n − 3)!! = 1 · 3 · · · (2 n − 3) T. Wu Evolutionary Analysis
The PDA model ◮ The number of trees in T n is ϕ ( n ) := (2 n − 3)!! = 1 · 3 · · · (2 n − 3) ◮ Under the proportional to distinguishable arrangements (PDA) model, each tree has the same probability to be generated, that is, we have 1 P u ( T ) = (1) ϕ ( n ) for every T in T n . T. Wu Evolutionary Analysis
The YHK model Under the Yule–Harding model [Yule 1925, Harding 1971], ◮ Beginning with a two leafed tree, we “grow” it by repeatedly splitting a leaf into two new leaves. T. Wu Evolutionary Analysis
The YHK model Under the Yule–Harding model [Yule 1925, Harding 1971], ◮ Beginning with a two leafed tree, we “grow” it by repeatedly splitting a leaf into two new leaves. ◮ The splitting leaf is chosen randomly and uniformly among all the present leaves in the current tree. T. Wu Evolutionary Analysis
The YHK model Under the Yule–Harding model [Yule 1925, Harding 1971], ◮ Beginning with a two leafed tree, we “grow” it by repeatedly splitting a leaf into two new leaves. ◮ The splitting leaf is chosen randomly and uniformly among all the present leaves in the current tree. ◮ After obtaining an unlabeled tree with n leaves, we label each of its leaves with a label sampled randomly uniformly (without replacement) from { 1 , · · · , n } . T. Wu Evolutionary Analysis
The YHK model Under the Yule–Harding model [Yule 1925, Harding 1971], ◮ Beginning with a two leafed tree, we “grow” it by repeatedly splitting a leaf into two new leaves. ◮ The splitting leaf is chosen randomly and uniformly among all the present leaves in the current tree. ◮ After obtaining an unlabeled tree with n leaves, we label each of its leaves with a label sampled randomly uniformly (without replacement) from { 1 , · · · , n } . When branch lengths are ignored, the Yule–Harding model is shown [Aldous,1996] to be equivalent to the trees generated by Kingman’s coalescent process, and so we call it the YHK model. T. Wu Evolutionary Analysis
Subtree Pattern ◮ Cherry: a subtree with two leaves ◮ Pitchfork: a subtree with three leaves T. Wu Evolutionary Analysis
Subtree Pattern ◮ Cherry: a subtree with two leaves ◮ Pitchfork: a subtree with three leaves Figure: A tree with three cherries and one pitchfork. T. Wu Evolutionary Analysis
Subtree Pattern II Given a phylogenetic tree T , let ◮ A ( T ): the number of pitchforks; ◮ C ( T ): the number of cherries. T. Wu Evolutionary Analysis
Subtree Pattern II Given a phylogenetic tree T , let ◮ A ( T ): the number of pitchforks; ◮ C ( T ): the number of cherries. For n ≥ 2, consider the random variables ◮ A n : the number of pitchforks in a random tree; ◮ C n : the number of cherries in a random tree. T. Wu Evolutionary Analysis
Subtree Pattern II Given a phylogenetic tree T , let ◮ A ( T ): the number of pitchforks; ◮ C ( T ): the number of cherries. For n ≥ 2, consider the random variables ◮ A n : the number of pitchforks in a random tree; ◮ C n : the number of cherries in a random tree. What are the joint distributions of A n and C n ? T. Wu Evolutionary Analysis
Joint distributions: formulae Theorem (W-Choi, 2016) For n > 3 and 1 < b < n, we have P y ( A n +1 = a , C n +1 = b ) = 2 a n P y ( A n = a , C n = b ) + ( a + 1) P y ( A n = a + 1 , C n = b − 1) n + 2( b − a + 1) P y ( A n = a − 1 , C n = b ) n + ( n − a − 2 b + 2) P y ( A n = a , C n = b − 1) . n T. Wu Evolutionary Analysis
Joint distributions: formulae Theorem (W-Choi, 2016) For n > 3 and 1 < b < n, we have P y ( A n +1 = a , C n +1 = b ) = 2 a n P y ( A n = a , C n = b ) + ( a + 1) P y ( A n = a + 1 , C n = b − 1) n + 2( b − a + 1) P y ( A n = a − 1 , C n = b ) n + ( n − a − 2 b + 2) P y ( A n = a , C n = b − 1) . n Note: A similar formula for the PDA model. T. Wu Evolutionary Analysis
Statistical properties ◮ A dynamic approach to computing the joint distributions. T. Wu Evolutionary Analysis
Statistical properties ◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint (and the marginal) distributions. T. Wu Evolutionary Analysis
Statistical properties ◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint (and the marginal) distributions. ◮ The cherry distributions are log-concave. That is, for n > 2 and 1 < k < n , we have P y ( C n = k ) 2 ≥ P y ( C n = k + 1) P y ( C n = k − 1) T. Wu Evolutionary Analysis
Statistical properties ◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint (and the marginal) distributions. ◮ The cherry distributions are log-concave. That is, for n > 2 and 1 < k < n , we have P y ( C n = k ) 2 ≥ P y ( C n = k + 1) P y ( C n = k − 1) ◮ There exists a unique change point for the cherry distributions between the YHK and the PDA models. T. Wu Evolutionary Analysis
Statistical properties ◮ A dynamic approach to computing the joint distributions. ◮ A unified approach to calculating the moments of the joint (and the marginal) distributions. ◮ The cherry distributions are log-concave. That is, for n > 2 and 1 < k < n , we have P y ( C n = k ) 2 ≥ P y ( C n = k + 1) P y ( C n = k − 1) ◮ There exists a unique change point for the cherry distributions between the YHK and the PDA models. ◮ Similar results for clade sizes and clan sizes [Zhu-Than-W, 2015]. T. Wu Evolutionary Analysis
Part III: Phylogenetic Networks T. Wu Evolutionary Analysis
The tangled tree of life T. Wu Evolutionary Analysis
From trees to networks Phylogenetic tree is useful, but networks provide a better tool for studying ◮ conflicting signals ◮ recombination ◮ gene flow ◮ hybridization ◮ horizontal gene transfer ◮ · · · T. Wu Evolutionary Analysis
Phylogenetic Networks: Unrooted (11) (11) (3) (3) (4) (4) (7) (1) (12) (1) (6) (5) (15) (9) (8) (10) (10) (13) (7) (2) (14) (14) (2) (8) (6) (13) (5) (12) (9) (15) Figure: A phylogenetic tree and network relating 15 plants species from the genus Solanum ; from [Bastkowski-Moulton-Spillner-Wu, 2015, Bull. Math. Biol. ] T. Wu Evolutionary Analysis
Network thinking: pedigree Figure: A partial pedigree of Prince Charles; from [Gusfield, 2014]. T. Wu Evolutionary Analysis
Recombination Figure: A history with recombination; from [Gusfield, 2014]. T. Wu Evolutionary Analysis
Phylogenetic Networks A (rooted) phylogenetic network: ◮ a directed acyclic graph ◮ a unique root ◮ leaves are labelled by taxa ◮ no vertex with one parent and one child ◮ binary A central problem: How to reconstruct phylogenetic networks? T. Wu Evolutionary Analysis
Assembling trees: Supertree a c a d b a b e b a b c d e c e c a d e b e d Input trees T. Wu Evolutionary Analysis
Assembling trees: Supertree a c a d b a b e b a b c d e c e c a d e b e d Input trees ◮ A tree is encoded by its subtrees on three leaves. T. Wu Evolutionary Analysis
Assembling trees: Supertree a c a d b a b e b a b c d e c e c a d e b e d Input trees ◮ A tree is encoded by its subtrees on three leaves. ◮ A polynomial algorithm to assemble trees [Aho et al. 1981]. T. Wu Evolutionary Analysis
Assembling trees: Supertree a c a d b a b e b a b c d e c e c a d e b e d Input trees ◮ A tree is encoded by its subtrees on three leaves. ◮ A polynomial algorithm to assemble trees [Aho et al. 1981]. T. Wu Evolutionary Analysis
A Quiz! Question: Are networks encoded by their trees? T. Wu Evolutionary Analysis
A Quiz! Question: Are networks encoded by their trees? ρ ρ ρ N T 2 T 1 a b c a b c a b c T. Wu Evolutionary Analysis
Answer Question: Are networks encoded by their trees? ρ ρ ρ ρ N ′ N T 2 T 1 a b c a b c a b c a b c Answer: No. T. Wu Evolutionary Analysis
Another quiz! Question: Are networks encoded by their subnetworks? T. Wu Evolutionary Analysis
Another quiz! Question: Are networks encoded by their subnetworks? f e c f e d c b a f e d c b a Figure: An example of subnetwork. T. Wu Evolutionary Analysis
A nontrivial answer Theorem (Huber-Iersel-Moulton-Wu, 2015, Syst. Biol. ) For every n ≥ 3 , there exist two non-isomorphic phylogenetic networks N 1 and N 2 with n leaves such that they display the same set of subnetworks (and the same set of trees). T. Wu Evolutionary Analysis
A nontrivial answer Theorem (Huber-Iersel-Moulton-Wu, 2015, Syst. Biol. ) For every n ≥ 3 , there exist two non-isomorphic phylogenetic networks N 1 and N 2 with n leaves such that they display the same set of subnetworks (and the same set of trees). a b c d a b c d T. Wu Evolutionary Analysis
Level-1 networks In [Huber-Moulton, 2013, Algorithmica ], it is shown that level-1 networks are encoded by their subnetworks. a c b e g f h d i j N Figure: level-1 = all undirected cycles are disjoint T. Wu Evolutionary Analysis
Trinets z x y x z x y z x y z y T 1 ( x, y ; z ) N 1 ( x, y ; z ) N 2 ( x, y ; z ) S 1 ( x, y ; z ) z x y x x y z y x y z z N 5 ( x ; y ; z ) N 3 ( x ; y ; z ) N 4 ( x ; y ; z ) S 2 ( x ; y ; z ) Figure: Eight types of level-1 networks on three leaves. T. Wu Evolutionary Analysis
Assembling Trinets Input: A collection of trinets. c a e d a b c c f Task: (1)To decide whether there exists a binary level-1 phylogenetic network display- b c ing the collection of trinets. h e g e f g i Input trinets T. Wu Evolutionary Analysis
Assembling Trinets Input: A collection of trinets. c a e d a b c c f Task: (1)To decide whether there exists a binary level-1 phylogenetic network display- b c ing the collection of trinets. h e g (2)Construct such a network if e f g i it exists. Input trinets T. Wu Evolutionary Analysis
Incomplete data In [Huber-Iersel-Moutlon-Scornavacca-Wu, in revision for Algorithmica ], we show that when some trinet is missing, then ◮ the trinet assembling problem is NP-hard; T. Wu Evolutionary Analysis
Incomplete data In [Huber-Iersel-Moutlon-Scornavacca-Wu, in revision for Algorithmica ], we show that when some trinet is missing, then ◮ the trinet assembling problem is NP-hard; ◮ it can be solved by an O (3 n poly ( n )) algorithm. T. Wu Evolutionary Analysis
Incomplete data In [Huber-Iersel-Moutlon-Scornavacca-Wu, in revision for Algorithmica ], we show that when some trinet is missing, then ◮ the trinet assembling problem is NP-hard; ◮ it can be solved by an O (3 n poly ( n )) algorithm. Question: How about ’real data’ (often noisy and containing conflict signals)? T. Wu Evolutionary Analysis
Trilonet ATCGTCATTCCGG a h ATCGTCATTCCGG c b ATGGTCAATCTGG a e d i ATGGTCAATCTGG a b c c c ATGGTCAATGTCC f ATGGTCAATGTCC j b h h ATCGTCATTCCGG e g e f g i i ATGGTCAATCTGG j j ATGGTCAATGTCC h A dense set of trinets An alignment on X = { a, . . . , j } i Identify a suitable subst of taxa a a y ∗ b c b c e g e g f h d d f h i j j i N Figure: A schematic view of Tri net-based L evel O ne Net work reconstructor, from [Oldman ∗ -Wu ∗ -Iersel-Moutlon, in revision for MBE]. T. Wu Evolutionary Analysis
Trilonet: a case study Giardia_lamblia_ATCC_50803_WB Giardia_intestinalis_isolate_246 Giardia_intestinalis_isolate_303 Giardia_intestinalis_isolate_305 Giardia_intestinalis_isolate_55 Giardia_intestinalis_isolate_JH #H1 Giardia_intestinalis_isolate_335 Figure: The inferred phylogeny of 7 Giardia strains by Trilonet; data from [Cooper et al, Curr. Biol., 2007]. T. Wu Evolutionary Analysis
Trilonet Trilonet is an algorithm for inferring level-1 network: ◮ Constructing a network directly from sequence data (without using breaking points or gene trees). ◮ Efficient, and robust for noisy data. T. Wu Evolutionary Analysis
Trilonet Trilonet is an algorithm for inferring level-1 network: ◮ Constructing a network directly from sequence data (without using breaking points or gene trees). ◮ Efficient, and robust for noisy data. ◮ Implemented in Java, and will be available at https://www.uea.ac.uk/computing/trilonet ◮ Consistent. T. Wu Evolutionary Analysis
Trilonet Trilonet is an algorithm for inferring level-1 network: ◮ Constructing a network directly from sequence data (without using breaking points or gene trees). ◮ Efficient, and robust for noisy data. ◮ Implemented in Java, and will be available at https://www.uea.ac.uk/computing/trilonet ◮ Consistent. Future improvement includes ◮ level-k networks ◮ statistical consistency T. Wu Evolutionary Analysis
Part IV: Future Directions T. Wu Evolutionary Analysis
Network models and inference More realistic models: ◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes T. Wu Evolutionary Analysis
Network models and inference More realistic models: ◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes Reconstructing networks ◮ Rigorous statistical frameworks ( Maximal Likelihood or Bayesian ) T. Wu Evolutionary Analysis
Network models and inference More realistic models: ◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes Reconstructing networks ◮ Rigorous statistical frameworks ( Maximal Likelihood or Bayesian ) ◮ Accounting for non-tree like patterns resulted from ◮ Sequencing errors (e.g. SNP calling) ◮ Incomplete Lineage Sorting (see, e.g. Yu et al. 2014 PNAS) T. Wu Evolutionary Analysis
Network models and inference More realistic models: ◮ Superimposing molecular evolutionary models on edges ◮ Quantifying the contribution made by reticulate processes Reconstructing networks ◮ Rigorous statistical frameworks ( Maximal Likelihood or Bayesian ) ◮ Accounting for non-tree like patterns resulted from ◮ Sequencing errors (e.g. SNP calling) ◮ Incomplete Lineage Sorting (see, e.g. Yu et al. 2014 PNAS) ◮ Efficient algorithms for searching the network space T. Wu Evolutionary Analysis
Space of phylogenetic networks c b c c d b a d a a b d a c a d a b b c c d b d a c a b a b a c a d a d c c b d d d b d b c b c c d b b c b d d c a a a Figure: Space of level-1 networks with four taxa; from [Huber-Linz-Moulton-Wu, J. Math. Biol., 2016] T. Wu Evolutionary Analysis
Network operation v 1 v 4 v 1 v 4 A C A C v 1 v 1 v 3 v 5 v 5 v 2 v 3 v 6 v 6 v 2 v 4 v 4 v 3 v 3 v 2 v 2 B D B D T ′ N ′ T N (i) (ii) Figure: A generalisation of the NNI operation on networks. T. Wu Evolutionary Analysis
Recommend
More recommend