Maximum Agreement Subtrees Seth Sullivant North Carolina State University March 24, 2018 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 1 / 23
Phylogenetics Problem Given a collection of species, find the tree that explains their history. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 2 / 23
Phylogenetics Problem Given a collection of species, find the tree that explains their history. Yeates, Meier, Wiegman, 2015 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 2 / 23
Rooted Binary X -Trees Definition A rooted tree T has a distinguished vertex ρ , the root. A rooted binary phylogenetic X tree T is a binary tree that has a distinguished root vertex and where the leaves are labeled by X . 2 5 7 8 1 6 4 3 In phylogenetics, only have access to data on extant (not extinct) species. We don’t know data or information about species corresponding to internal nodes in the tree. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 3 / 23
Induced subtrees Let X be a label set, with n = |X| . Let T be a binary rooted phylogenetic X -tree. Given S ⊆ X , T | S is the binary restriction tree. 2 5 7 2 5 2 5 4 6 8 1 3 3 3 Definition Given T 1 , T 2 binary rooted phylogenetic X -trees, MAST ( T 1 , T 2 ) = max { # S : S ∈ X and T 1 | S = T 2 | S } This is the size of a maximum agreement subtree. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 4 / 23
Example 5 7 2 5 7 8 4 6 8 1 2 1 6 4 3 3 MAST ( T 1 , T 2 ) = 3 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23
Example 5 7 2 5 7 8 4 6 8 1 2 1 6 4 3 3 MAST ( T 1 , T 2 ) = 3 5 2 3 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23
Example 5 7 2 5 7 8 4 6 8 1 2 1 6 4 3 3 MAST ( T 1 , T 2 ) = 3 5 7 7 2 5 2 3 6 4 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23
Example 5 7 2 5 7 8 4 6 8 1 2 1 6 4 3 3 MAST ( T 1 , T 2 ) = 3 5 7 7 2 5 2 3 6 4 Theorem (Steel-Warnow 1993) There is an O ( n 2 ) algorithm to compute MAST ( T 1 , T 2 ) of binary rooted phylogenetic X -trees. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23
What is the distribution of MAST ( T 1 , T 2 ) ? Problem Determine the distribution of MAST ( T 1 , T 2 ) for reasonable “nice” probability distributions on rooted binary trees. Uniform distribution Yule-Harding distribution Remark Simulations [Bryant-Mackenzie-Steel 2003] suggest that under both the uniform distribution and the Yule-Harding distribution √ E [ MAST ( T 1 , T 2 )] ∼ c n where n = |X| , for some constant c depending on the distribution. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 6 / 23
Motivation: Comparing New Phylogenetic Methods Suppose we come up with a new phylogenetic method. This method takes a data set D and constructs the tree M ( D ) . If we know the correct tree T we can evaluate the method by computing MAST ( T , M ( D )) . If MAST ( T , M ( D )) is consistently small (for lots of different D ), then we conclude that the new method does not work well. How small is small? Is it smaller than what you would expect to see by random chance? Need to know the distribution of MAST ( T , T ′ ) . Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 7 / 23
Motivation: Cospeciation Let T H be a phylogenetic tree of host species, and T P a phylogenetic tree of parasite species. Host and parasites are paired, so T H and T P have same label set. If MAST ( T H , T P ) is “large”, reject hypothesis that T H and T P evolved independently. i.e. large MAST ( T H , T P ) = ⇒ cospeciation. Need distribution of MAST ( T 1 , T 2 ) for random trees under null hypothesis of independence to perform hypothesis test. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 8 / 23
Motivation: Cospeciation Let T H be a phylogenetic tree of host species, and T P a phylogenetic tree of parasite species. Host and parasites are paired, so T H and T P have same label set. If MAST ( T H , T P ) is “large”, reject hypothesis that T H and T P evolved independently. i.e. large MAST ( T H , T P ) = ⇒ cospeciation. Need distribution of MAST ( T 1 , T 2 ) for random trees under null hypothesis of independence to perform hypothesis test. Hafner, M.S., Nadler, S.A. (1988) Nature 332: 258-259 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 8 / 23
Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 3 5 6 7 8 w w w w w w 2 4 9 w w w 1 2 3 4 5 6 7 8 9 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23
Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 3 5 6 7 8 w w w w w w 2 4 9 w w w 1 2 3 4 5 6 7 8 9 A maximum agreement subtree corresponds to a longest increasing subsequence of the permutation w = w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 , denoted L ( w ) . Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23
Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 3 5 6 7 8 w w w w w w 2 4 9 w w w 1 2 3 4 5 6 7 8 9 A maximum agreement subtree corresponds to a longest increasing subsequence of the permutation w = w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 , denoted L ( w ) . MAST ( T 1 , T 2 ) for uniformly random comb trees is equivalent to L ( w ) for uniformly random permutations w ∈ S n . Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23
Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 3 5 6 7 8 w w w w w w 2 4 9 w w w 1 2 3 4 5 6 7 8 9 A maximum agreement subtree corresponds to a longest increasing subsequence of the permutation w = w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 , denoted L ( w ) . MAST ( T 1 , T 2 ) for uniformly random comb trees is equivalent to L ( w ) for uniformly random permutations w ∈ S n . Theorem (Baik-Deift-Johansson 1999) √ n − cn 1 / 6 + o ( n 1 / 6 ) c ≈ 1 . 77108 E [ L ( w )] = 2 L ( w ) − 2 √ n → Tracy-Widom Random Variable n 1 / 6 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23
Random Trees Biologists are interested in models for random trees as models for speciation processes. Uniform distribution: Select a uniform tree from all ( 2 n − 3 )!! rooted binary phylogenetic trees Yule-Harding distribution: Grow a random tree by successively splitting leaves selected uniformly at random, then apply leaf labels at random. 2 1 5 3 4 β -splitting model, α -splitting model, etc. Question How well do the different random tree models match the shape and structure of phylogenetic trees occurring in nature? Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 10 / 23
Properties of Random Trees Proposition Both Yule-Harding and uniform random trees satisfy exchangeability and sampling consistency. P( )= P( ) 1 2 3 4 5 2 1 5 3 4 Exchangeability: Sampling Consistency: If T is a random tree, and S ⊆ X then T | S is a random tree from the same distribution on leaf label set S . Theorem (Aldous) The expected depth of a uniformly random tree is Θ( √ n ) . The expected depth of Yule-Harding random tree is Θ( log n ) . Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 11 / 23
Conjecture About The Maximum Agreement Subtree Conjecture For any exchangeable sampling consistent distribution on rooted binary phylogenetic X -trees, E [ MAST ( T 1 , T 2 )] = Θ( √ n ) where n = |X| . Recall that f ( n ) = Θ( √ n ) means that there are positive constants c and C such that √ √ n ≤ f ( n ) ≤ C c n . Note that the constants c and C might depend on the probability distribution. We hope further that we can show that, asymptotically √ E [ MAST ( T 1 , T 2 )] ∼ d n for some d (depending on the distribution) as n → ∞ . Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 12 / 23
Upper bounds Theorem (BHLSSS) For any exchangeable sampling consistent distribution on rooted binary phylogenetic trees, E [ MAST ( T 1 , T 2 )] = O ( √ n ) . Proof sketch for uniform distribution. For S ⊆ X let X S = 1 if T 1 | S = T 2 | S , X S = 0 otherwise. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 13 / 23
Upper bounds Theorem (BHLSSS) For any exchangeable sampling consistent distribution on rooted binary phylogenetic trees, E [ MAST ( T 1 , T 2 )] = O ( √ n ) . Proof sketch for uniform distribution. For S ⊆ X let X S = 1 if T 1 | S = T 2 | S , X S = 0 otherwise. � Let Y n , k = X S = number of agreement sets of size k S ⊆X , # S = k Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 13 / 23
Upper bounds Theorem (BHLSSS) For any exchangeable sampling consistent distribution on rooted binary phylogenetic trees, E [ MAST ( T 1 , T 2 )] = O ( √ n ) . Proof sketch for uniform distribution. For S ⊆ X let X S = 1 if T 1 | S = T 2 | S , X S = 0 otherwise. � Let Y n , k = X S = number of agreement sets of size k S ⊆X , # S = k → 0 if k > c √ n � n � � n � 1 E [ Y n , k ] = P ( X S = 1 ) = − k k ( 2 k − 3 )!! Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 13 / 23
Recommend
More recommend