swinging from tree to tree rearrangement operations and
play

Swinging from Tree to Tree: Rearrangement Operations and their - PowerPoint PPT Presentation

Swinging from Tree to Tree: Rearrangement Operations and their Metrics Stefan Grnewald CAS-MPG Partner Institute for Computational Biology Shanghai, China Phylogenetic Trees Phylogenetic Trees A phylogenetic tree T is a b a (graph


  1. Swinging from Tree to Tree: Rearrangement Operations and their Metrics Stefan Grünewald CAS-MPG Partner Institute for Computational Biology Shanghai, China

  2. Phylogenetic Trees

  3. Phylogenetic Trees A phylogenetic tree T is a b a (graph theoretic) tree g without vertices of degree 2. h d Its leaf set L(T) is also called the taxa set. T is called binary if all interior vertices have degree 3.

  4. Different trees with the same taxa • In phylogenetics one often observes several different trees on the same taxa set, e.g. by using different methods or analyzing different genes. • Therefore, it is important to quantify how different two trees are. • One common way to do so is using tree rearrangement operations

  5. Nearest Neighbour Interchange (NNI) ● An NNI operation on an unrooted binary phylogenetic tree consists of identifying the two vertices u and v incident with an internal edge and then resolving it in one of the two different ways.

  6. Bigger steps Subtree Prune and Regraft (SPR) and Tree Bisection and Reconnection (TBR) operations consit of removing an interal edge and connecting the 2 resulting components differently.

  7. SPR Operations An SPR operation on an unrooted phylogenetic X -tree T is defined as follows: ● Remove an edge uv from T such that the component that contains v contains at least three taxa. ● Choose an edge that is not incident with v from the component of T - uv that contains v and subdivide it by a new vertex w . ● Insert an edge uv . ● Suppress the vertex v of degree 2.

  8. Distances ● Let be the set of all phylogenetic trees with taxa T n set {1,…, n }. ● For Θ∈ {NNI, SPR, TBR}, let G ( n , Θ ) be the graph with vertex set where two vertices are T n adjacent, if one can be obtained from the other by performing a Θ -operation. ● The graph distance of G ( n , Θ ) defines a distance d Θ on . T n

  9. Applications of SPR ● The SPR distance has been used to estimate the amount of lateral gene transfer. ● SPR moves are used to escape from local optima in (meta-)heuristics to construct phylogenetic trees.

  10. Unit neighborhood ● The size of the neighborhood of a tree with n taxa (the degree of a vertex in G ( n , Θ )) ● equals 2( n -3) for NNI ● equals 2( n -3)(2 n -7) for SPR (Allen, Steel, 2001), ● depends on the tree shape and there are a lower bound O ( n 2 log n ) and an upper bound O ( n 3 ) for TBR (Humphries, Wu, preprint).

  11. The diameter ● The diameter of G(n,NNI) is known and O ( n log n ), Li et al. 1996. ● The diameter of G(n,SPR) is between 1/2 n - o ( n ) and n -3, Allen and Steel 2001. ● The diameter of G(n,TBR) is between 1/4 n - o ( n ) and n -3, Allen and Steel 2001.

  12. The diameter ● The diameter of G(n,NNI) is known and O ( n log n ), Li et al. 1996. ● The diameter of G(n,SPR) is between 1/2 n - o ( n ) and n -3, Allen and Steel 2001. ● The diameter of G(n,TBR) is between 1/4 n - o ( n ) and n -3, Allen and Steel 2001. ● Theorem (Ding, SG, Humphries, submitted): � � + 1 � � TBR ( n ) � � SPR ( n ) � n � � � n � 2 n n 12

  13. Restrictions A restriction of a phylogenetic tree T to a subset S of L ( T ) is the tree obtained from the smallest subtree of T containing S by suppressing all vertices of degree 2. a b c g e h f d

  14. Restrictions A restriction of a phylogenetic tree T to a subset S of L ( T ) is the tree obtained from the smallest subtree of T containing S by suppressing all vertices of degree 2. c g e h d

  15. Restrictions A restriction of a phylogenetic tree T to a subset S of L ( T ) is the tree obtained from the smallest subtree of T containing S by suppressing all vertices of degree 2. c g e h d

  16. Agreement forests An agreement forest for two trees T , T’ in is a T n collection { T 0 ,…, T k } of binary phylogenetic trees such that (i) the taxa sets of T 0 ,…, T k form a partition of {1,…, n }. (ii) T i is a restriction of T and T’ for all i. (iii) The smallest subtrees containing L ( T 0 ),…, L ( T k ) of T resp. T’ are vertex-disjoint.

  17. a b c g e h f d e f c g d h a b

  18. a b c g e h f d e f c g d h a b d g h a

  19. a b c g e h f d e f c g d h a b d b g e c h f a

  20. Maximum agreement forests ● An agreement forest for T , T’ is a maximum agreement forest if the number of trees is minimal. Lemma 1 (Allen, Steel, 2001): If { T 0 ,…, T k } is a maximum agreement forest for T , T’, then d TBR ( T , T’ )= k . Lemma 2 : If { T 0 ,…, T k } is an agreement forest for T , T’ such that every tree contains at most 2 taxa, then d SPR ( T , T’ ) ≤ k.

  21. Caterpillars A caterpillar is a binary phylogenetic tree where the interior vertices form a path. A label ordering is a permutation of the taxa set such that two consecutive elements are adjacent to the same interior vertex or to two adjacent interior vertices. a b c g e h f d A caterpillar with label ordering h,g,b,a,d,c,e,f.

  22. The lower bound Lemma 3: Let k , l be positive integers such that 2 ≤ k ≤ l , and let T , T’ ∈ be caterpillars such T kl that T has the label ordering [1, . . . , kl ] and T’ has the label ordering [1, k +1,..., k ( l- 1) + 1, 2, k + 2,…, k ( l- 1) + 2,..., k , k + k ,..., k ( l- 1)+ k ]. Then d TBR ( T , T’ )=( k- 1)( l- 1). To obtain the lower bound we choose k ≈ l .

  23. Chopping trees Lemma 4: Let k ≥ 0 and l, m, n > 1 be integers such that n ≥ 2 k ( m − 1 ) + l , and let T ∈ . Then there is a T n collection T 0 ,…, T k of vertex-disjoint subtrees of T such that | L ( T 0 )| ≥ l and |L( T i )| ≥ m for all i ∈ {1,...,k}.

  24. Chopping trees Lemma 4: Let k ≥ 0 and l, m, n > 1 be integers such that n ≥ 2 k ( m -1) + l , and let T ∈ . Then there is a T n collection T 0 ,…, T k of vertex-disjoint subtrees of T such that | L ( T 0 )| ≥ l and |L( T i )| ≥ m for all i ∈ {1,...,k}. ,< m taxa , ≥ m taxa ≤ 2( m -1) taxa ,< m taxa

  25. The upper bound ● Given T,T’ in , T n ● We chop T into about trees with about taxa. n n ● Then we chop smallest possible trees from T’ such that the chopped tree has at least taxa with one of the subtrees of T (which has not yet been used) in common. ● We get an agreement forest with about trees n with 2 taxa. ● Applying Lemma 2 yields the upper bound.

  26. Chains A chain of length l in a phylogenetic tree is a path v 1 ,..., v l of l interior vertices such that every vertex v is adjacent to a leaf (i=1,…, l ). x i x 1 x 2 x 3 v 1 v 2 v 3

  27. The Chain Reduction Conjecture Conjecture (Allen, Steel 2001): If two binary phylogenetic X- trees T and T' both contain the same chain of length , then the SPR distance does l � 4 not change if the chain is replaced by identical chains of length 3 in both trees (correctly oriented).

  28. Consequences ● The corresponding result holds for TBR (easy to prove using maximum agreement forests) ● The conjecture implies fixed-parameter tractability of computing the SPR distance between two given trees. ● This has been shown using a different approach.

  29. More reasons to solve it The chain reduction is already implemented in a program to compute (or estimate) the SPR distance (Hickey et al. 2008). They also gave statistical evidence by testing 20000 pairs of trees.

  30. More reasons to solve it The chain reduction conjecture is one of Mike Steel’s 100 NZ$ problems. It even became a Penny ante and solving it yields a bottle of single malt.

  31. Induced SPR sequences Every sequence S of SPR operations between two X -trees T and T’ defines a sequence between the restrictions of T and T’ to a subset X’ of X. If two trees are identical, then the operation is removed from the sequence. Hence, d SPR ( T |X’, T' |X’ ) ≤ d SPR ( T,T' )

  32. A reformulation ● We fix two X -trees T and T’ and edges uv and u’v’ , respectively. ● We denote the trees that we get by subdividing the edge uv resp. u’v’ by a chain of length i with taxa with increasing indices from u to v x 1 ,..., x l (resp. u’ to v’) by resp. . T i � T i ● We define d i =d SPR ( T i , T i ’ ). ● Conjecture: d i = d 3 for every integer i ≥ 3.

  33. An Example u v v � u � Let T=T’ and u,v,u’,v’ as above . We have d 0 = 0 , d 1 = 1 , d 2 = 2 , d 3 = 3 , and d i = 3 for i >3.

  34. An Example x 1 x 2 x 3 x 4 u v v u � �

  35. An Example x 1 x 2 x 3 x 4 u v u � �

  36. An Example u v u � � x 1 x 2 x 3 x 4

  37. An Example u v u � � x 1 x 2 x 3 x 4

  38. An Example u v u � � x 1 x 2 x 3 x 4

  39. An easy lemma Lemma 5: d i ≤ d 0 +3 for all i . Statement: If d i = d i +1 for some i ≥ 1, then d j = d i for every j ≥ i .

  40. An easy lemma Lemma 5: d i ≤ d 0 +3 for all i . Statement: If d i = d i +1 for some i ≥ 1, then d j = d i for every j ≥ i .

  41. Very long chains ● Theorem (Bonet, St. John 08): There is a linearly bounded function f : N → N such that d 0 ≤ d implies d i = d f ( d ) for every integer i ≥ f ( d ). ● Using their ideas we can show that for two trees T , T’ in there is a shortest SPR sequence such T n that all edges in a chain of length f ( d ) are never altered (removed or subdivided).

Recommend


More recommend