Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden
Phylogenetic trees • Motivation • Rooted and unrooted trees • Rooted trees: Hierarchical clustering • Drawing trees • Unrooted trees: Neighbour joining
Distance in matrix = distance in tree? A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E A B C D E
Distance in matrix = distance in 2D? A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E
Distance in matrix = distance in 2D? D E A B C D E A 2 6 10 9 B 5 9 8 C C 4 5 D 3 E A B
Distance in matrix = distance in 2D? No, not always A B C A 1 10 B 1 C Distance in matrix = distance in tree?
Distance in matrix = distance in tree § If tree is additive § distance from v to w is § sum of edge lengths connecting v to w
Additive Tree Is there an additive tree? A B C D E F A 27 24 22 31 30 B 11 21 12 11 C 18 15 14 D 25 24 E 5 F
Additive Tree Yes, there is an additive tree A A B C D E F A 27 24 22 31 30 14 C B B 11 21 12 11 4 4 C 18 15 14 6 3 F 2 5 D 25 24 8 3 E 5 E D F
Additive Tree Tree is additive iff* for all nodes i,j,k,l D i,j + D k,l = D i,k + D j,l ≥ D i,l + D j,k j i l k * iff is used in math/comp sci for „if and only if“
Constructing the Edges of the Tree A B C D E (A,B) C D E A 3 7 8 10 (A,B) 6.5 8 8.5 Average linkage (WPGMA) B 6 8 7 C 4 5 C 4 5 D 6 D 6 E E (A,B) 1.5 1.5 B A
Constructing the Edges of the Tree (A,B) C D E (A,B) (C,D) E (A,B) 6.5 8 8.5 Average linkage (WPGMA) (A,B) 7.25 8.5 C 4 5 (C,D) 5.5 D 6 E E (A,B) (C,D) 1.5 1.5 2 2 C B D A
Constructing the Edges of the Tree (A,B) (C,D) E (A,B) ((C,D),E) Average linkage (WPGMA) (A,B) 7.25 8.5 (A,B) 7.875 (C,D) 5.5 ((C,D),E) E ((C,D),E) 0.75 (A,B) (C,D) 2.75 1.5 1.5 2 2 E C B D A
Constructing the Edges of the Tree (A,B) ((C,D),E) (A,B) 7.875 ((C,D),E) ((A,B),((C,D),E)) 1.1875 2.4375 ((C,D),E) 0.75 (A,B) (C,D) 2.75 1.5 1.5 2 2 E C B D A
Constructing the Edges of the Tree § If node w=(v,u) joins nodes v and u, then § L v,w = 0.5 D u,v – L v,v’ § D refers to the distances (from the matrix) and § L to the lengths of the edges § L v,v’ is zero if v is a leave node w L v,w u v L v,v’ v’
Original and tree distances may differ Linkage method changes distances. Tree reflects changed distances A B C D E A B C D E A 3 7 8 10 A 3 7.875 7.875 7.875 B 6 8 7 B 7.875 7.875 7.875 C 4 5 C 4 5.5 D 6 D 5.5 E E Original distances Distances in tree ((C,D),E) 0.75 (A,B) (C,D) 2.75 1.5 1.5 2 2 E C B D A
Is hierarchical clustering always right? Original data A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F A and C are closest A and C have same distance to D, E, and F B is closer to A than to C http://www.icp.ucl.ac.be/~opperd/private/upgma.html
Is hierarchical clustering always right? Original data Topology by UPGMA (Unweighted pair group method using arithmetic mean) A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F A and C are closest A and C are closest A and C have same distance to D, E, and F A and C have same distance to D, E, and F B is closer to A than to C B is closer to A than to C http://www.icp.ucl.ac.be/~opperd/private/upgma.html
Is hierarchical clustering always right? Original data Topology by UPGMA Better topology (Unweighted pair group method using arithmetic mean) A B C D E F A 5 4 7 6 8 B 7 10 9 11 C 7 6 8 D 5 9 E 8 F A and C are closest A and C are closest A and C are closest A and C have same distance to D, E, and F A and C have same distance to D, E, and F A and C have same distance to D, E, and F B is closer to A than to C B is closer to A than to C B is closer to A than to C How do we compute the better topology? Hierarchical clustering takes a local perspective, we need a global one http://www.icp.ucl.ac.be/~opperd/private/upgma.html
Phylogenetic trees • Motivation • Rooted and unrooted trees • Rooted trees: Hierarchical clustering • Drawing trees • Unrooted trees: Neighbour joining
Neighbour Joining Based on wikipedia
Neighbour Joining § Pair of nodes farthest from all other nodes § Let u i be distances from node i to all other nodes n ∑ u i = d ik k = 1 § Find pair of nodes i,j with minimal § Q(i,j) = (n-2) d i,j – (u i + u j ) Based on wikipedia and Felsenstein. Phylogenies
Neighbour Joining 1. Calculate Q 2. Choose pair i, j with lowest value in Q 3. Create new node u 4. Calculate distances from i and j to u 5. Calculate distances from all remaining k to u 6. Start the algorithm again, replacing i and j by u
Example Distances a b c d e a 5 9 9 8 b 10 10 9 c 8 7 d 3 e Q a b c d e a -50 -38 -34 -34 b -38 -34 -34 c -40 -40 d -48 e n ∑ u i = d ik Q(i,j) = (n-2) d i,j – (u i + u j ) k = 1 Based on wikipedia
Example Distances a b c d e a 5 9 9 8 b 10 10 9 c 8 7 d 3 e If node u joins i and j, then distance i to u is: d ( i , u ) = 1 2 d ( i , j ) + 2( n − 2) ( u i − u j ) 1 n ∑ where u i = d ik k = 1 i.e. give weight to (differing) distances of i and j to other nodes k Based on wikipedia
Rooting unrooted trees „Lift up“ at midpoint of longest path in tree A What is an outgroup? 14 How does it relate? C B 4 4 6 3 1.5 4.5 F 2 5 8 3 Longest path is from A to E: 14+6+3+5+3=31 Root at mid point of longest path: 31/2=15.5 E D
Assessing Quality: Bootstrapping § Given a tree obtained from one of the methods above § Generate Multiple Alignment § For a number of iterations § Generate new sequences by selecting columns (possibly the same column more than once) form the multiple alignment § Generate tree for the new sequences § Compare this new tree with the given tree § For each cluster in the given tree, which also approach in the new tree, the bootstrap value is increased § Bootstrap-Value = Percentage of trees containing the same cluster
Parsimony-method § Approach: Generate “ smallest ” tree containing all the sequences as leaves Seq 1 2 3 4 5 6 a G G G G G G b G G G A G T c G G A T A G d G A T C A T 3 G->A 4 G->T 5 G->A 2 G->A 3 T->A 4 G->A 4 T->C 6 G->T 6 G->T a GGGGGG b GGGAGT c GGATAG d GATCAT
Parsimony § Generate smallest tree § Informative vs. non-informative sites § Build pairs with fewest possible substitutions § Example: § 3 possible trees: § ((a,b),(c,d)) or ((a,c),(b,d)) or ((a,d),(b,c)) § 1,2,3,4 are not informative Seq 1 2 3 4 5 6 § 5,6 are informative a G G G G G G § 5: ((a,b),(c,d)) b G G G A G T § 6: ((a,c),(b,d)) c G G A T A G d G A T C A T
Maximum likelihood § Assigns quantitative probabilities to mutation events § Reconstructs ancestors for all nodes in the tree § Assigns branch lengths based on probabilities of the mutational events § For each possible tree topology, the assumed substitution rates are varied to find the parameters that give the highest likelihood of producing the observed data
Summary § Drawing trees from hierarchical clustering § Neighbour joining § Assessing quality with bootstrapping § (Parsimony and maximum likelihood)
Recommend
More recommend