phylogenetic trees i foundations distance based inference
play

Phylogenetic trees I Foundations, Distance-based inference Gerhard - PowerPoint PPT Presentation

Phylogenetic trees I Foundations, Distance-based inference Gerhard Jger Words, Bones, Genes, Tools February 28, 2018 Gerhard Jger Phylogenetic trees I WBGT 1 / 27 Background Background readings for this lecture Ewens and Grant (2005),


  1. Phylogenetic trees I Foundations, Distance-based inference Gerhard Jäger Words, Bones, Genes, Tools February 28, 2018 Gerhard Jäger Phylogenetic trees I WBGT 1 / 27

  2. Background Background readings for this lecture Ewens and Grant (2005), sections 15.1–15.4 Nunn (2011), chapter 2 Gerhard Jäger Phylogenetic trees I WBGT 2 / 27

  3. Background Why trees? tree diagrams have long history in linguistics and life sciences: taxonomies (from Aristotle to Linné) tree of life (Darwin) language family trees (Schleicher) commonalities between biological and language family trees: tree diagram represents a historical hypothesis internal nodes represent a historical reality, not just a taxonomic category technical term for this kind of tree: phylogenetic tree (aka phylogeny ) Gerhard Jäger Phylogenetic trees I WBGT 3 / 27

  4. Defjnitions Remark: Unrooted trees might seem to be unintuitive data structures. Later on WBGT Phylogenetic trees I Gerhard Jäger separate the two problems. quite difgerent task from estimating the location of the root. So it makes sense to we will see though that often, estimating the unrooted version of a phylogeny is a 4 / 27 An unrooted tree is a connected undirected acyclic weighted graph with positives Some defjnitions Defjnition (Tree) weights. In other words, an unrooted tree T is a triple ( V, E, l ) with V is a fjnite set, the nodes or vertices , E ⊂ V × V , the set of edges , is symmetric, E + ( E ’s transitive closure) is irrefmexive, E ∗ = V × V , and l : E �→ R + is a function assigning each edge a non-negative length .

  5. Defjnitions unrooted tree with all nodes WBGT Phylogenetic trees I Gerhard Jäger Some more defjnitions having degree 3 or 1. 5 / 27 An unrooted binary tree is an Nodes with degree 1 are called tips or leaves . Defjnition a component. unrooted tree R u s s i a n Old Church Slavonic L a t i n The degree of node v is the A n c i e n t G r e e k number of edges containing v as Dutch O l d N o r s e unrooted binary tree Old Church Slavonic R u s s i a n O l d N o r s e Dutch Ancient Greek Latin

  6. Defjnitions node (the root) has degree 2 WBGT Phylogenetic trees I Gerhard Jäger Even more defjnitions 1 or 3. and all other nodes have degrees 6 / 27 unrooted tree where exactly one (its root ). Defjnition (Rooted trees) A rooted binary tree is an rooted non-binary tree Ancient Greek Dutch Old Norse A rooted tree is a pair ( T , v ) , Latin where T is an unrooted tree and Old Church Slavonic v is a designated vertex in T Russian rooted binary tree Dutch Old Norse Old Church Slavonic Russian Ancient Greek Latin

  7. Defjnitions Distances Defjnition (Distances) Gerhard Jäger Phylogenetic trees I WBGT 7 / 27 Let T = ( V, E, l ) be a tree. Let d : V × V �→ R be the unique function such that for all a, b ∈ V : If ( a, b ) ∈ E , then d ( a, b ) = l ( a, b ) . l ( a, a ) = 0 . d ( a, b ) = d ( b, a ) . l ( a, b ) = min c ( l ( a, c ) + l ( c, b )) Vulgo: d ( a, b ) is the length of the unique path between a and b .

  8. Ultrametric trees A rooted tree is ultrametric ifg all WBGT Phylogenetic trees I Gerhard Jäger Ultrametric trees root. tips have the same distance from the 8 / 27 Defjnition (Ultrametric tree) with Defjnition (Ultrametric distance) ultrametric tree d is an ultrametric distance if it is a metric ( d ( a, a ) = 0 , d ( a, b ) = d ( b, a ) ≥ 0 , d ( a, b ) + d ( b, c ) ≥ d ( a, c ) d ( a, b ) ≤ max { d ( a, c ) , d ( b, c ) } Irish Welsh Breton Bengali Nepali Hindi Romanian Italian French Spanish Portuguese Catalan German Dutch English Danish Swedish Icelandic Lithuanian Czech Polish Ukrainian Russian Bulgarian Greek

  9. Ultrametric trees Ultrametric trees Theorem The pairwise distances between a set of taxa are ultrametric if and only if there is an ultrametric tree with the taxa as tips representing those distances. Proof: By induction over number of taxa. Unweighted Pair Group Method Using Arithmetic Averages (UPGMA) algorithm constructs ultrametric tree from pairwise distances. Gerhard Jäger Phylogenetic trees I WBGT 9 / 27

  10. Ultrametric trees UPGMA Cluster distances Gerhard Jäger Phylogenetic trees I WBGT 10 / 27 Led A and B be two non-empty sets of taxa. 1 d ( A, B ) . � = d ( x, y ) | A | × | B | x ∈ A,y ∈ B

  11. Ultrametric trees Iteration: WBGT Phylogenetic trees I Gerhard Jäger UPGMA 11 / 27 Initialization: UPGMA algorithm X ← the set of taxa. V ← X E ← ∅ h ( x ) = 0 ∀ x ∈ X while | X | > 1 { i, j } ← arg x ∈ X,y ∈ X,x � = y min d ( x, y ) X ← X \ { i, j } ∪ {{ i, j }} V ← V ∪ {{ i, j }} E ← E ∪ { ( { i, j } , i ) , ( { i, j } , j ) } h ( { i, j } ) = d ( i,j ) / 2 l ( { i, j } , i ) = h ( { i, j } ) − h ( i ) l ( { i, j } , j ) = h ( { i, j } ) − h ( j ) d ( { i, j } , k ) = d ( i,k )+ d ( j,k ) / 2

  12. Ultrametric trees Example WBGT Phylogenetic trees I Gerhard Jäger 12 / 27 h English Dutch German Italian Dutch 3 . 0 German 3 . 0 2 . 0 Italian 8 . 0 8 . 0 8 . 0 Spanish 8 . 0 8 . 0 8 . 0 3 . 4 0 Spanish Italian German Dutch English

  13. Ultrametric trees Example WBGT Phylogenetic trees I Gerhard Jäger 12 / 27 h English Dutch German Italian Dutch 3 . 0 German 2 . 0 3 . 0 Italian 8 . 0 8 . 0 8 . 0 Spanish 8 . 0 8 . 0 8 . 0 3 . 4 0 Spanish Italian German Dutch English

  14. Ultrametric trees Example WBGT Phylogenetic trees I Gerhard Jäger 12 / 27 h English a Italian a 3 . 0 Italian 8 . 0 8 . 0 Spanish 8 . 0 8 . 0 3 . 4 a 1 0 Spanish Italian German Dutch English

  15. Ultrametric trees Example WBGT Phylogenetic trees I Gerhard Jäger 12 / 27 h English a Italian a 3 . 0 Italian 8 . 0 8 . 0 Spanish 8 . 0 8 . 0 3 . 4 a 1 0 Spanish Italian German Dutch English

  16. Ultrametric trees Example WBGT Phylogenetic trees I Gerhard Jäger 12 / 27 h b Italian Italian 8 . 0 Spanish 8 . 0 3 . 4 b 1.5 a 1 0 Spanish Italian German Dutch English

  17. Ultrametric trees Example WBGT Phylogenetic trees I Gerhard Jäger 12 / 27 h b Italian Italian 8 . 0 Spanish 3 . 4 8 . 0 b 1.5 a 1 0 Spanish Italian German Dutch English

  18. Ultrametric trees Example WBGT Phylogenetic trees I Gerhard Jäger 12 / 27 h b c 8 . 0 c 1.7 b 1.5 a 1 0 Spanish Italian German Dutch English

  19. Ultrametric trees Example WBGT Phylogenetic trees I Gerhard Jäger 12 / 27 h b c 8 . 0 c 1.7 b 1.5 a 1 0 Spanish Italian German Dutch English

  20. Ultrametric trees Example WBGT Phylogenetic trees I Gerhard Jäger 12 / 27 h 4 d c 1.7 b 1.5 a 1 0 Spanish Italian German Dutch English

  21. Ultrametric trees paup> gettrees file= example1.paup.upgma.tre WBGT Phylogenetic trees I Gerhard Jäger view tree with Dendroscope and Figtree paup> q quit Paup example1.paup.upgma.tre replace=yes save tree in Newick format load tree again Doing it in Paup replace=yes brlens=yes save tree in Nexus format (no other choice available) paup> upgma compute UPGMA-tree paup> execute example1.nex load the distance matrix example1.nex into Paup paup> start the command-line version of Paup: 13 / 27 paup> upgma treefile=example1.paup.upgma.tre \ paup> savetrees format=newick brlen=user file = \

  22. Ultrametric trees Doing it in R Gerhard Jäger Phylogenetic trees I WBGT 14 / 27 load library library (phangorn)

  23. Ultrametric trees Doing it in R Gerhard Jäger Phylogenetic trees I WBGT 14 / 27 define distance matrix taxa <- c ('English','Dutch','German','Italian','Spanish') d <- as.dist ( matrix ( c (0.0,3.0,3.0,8.0,8.0, 3.0,0.0,2.0,8.0,8.0, 3.0,2.0,0.0,8.0,8.0, 8.0,8.0,8.0,0.0,3.4, 8.0,8.0,8.0,3.4,0.0 ), byrow=T,nrow=5, dimnames= list (taxa,taxa)))

  24. Ultrametric trees Doing it in R WBGT Phylogenetic trees I Gerhard Jäger 14 / 27 print (d) ## English Dutch German Italian ## Dutch 3.0 ## German 3.0 2.0 ## Italian 8.0 8.0 8.0 ## Spanish 8.0 8.0 8.0 3.4

  25. Ultrametric trees Doing it in R WBGT Phylogenetic trees I Gerhard Jäger 14 / 27 perform UPGMA upgma.tree <- upgma (d) cophenetic (upgma.tree)- as.matrix (d) ## English Dutch German Italian Spanish ## English 0 0 0 0 0 ## Dutch 0 0 0 0 0 ## German 0 0 0 0 0 ## Italian 0 0 0 0 0 ## Spanish 0 0 0 0 0

  26. Ultrametric trees Doing it in R WBGT Phylogenetic trees I Gerhard Jäger 14 / 27 visualize result plot (upgma.tree,type='cladogram') edgelabels (upgma.tree$edge.length) Spanish 1.7 1.7 Italian 2.3 German 1 2.5 1 0.5 Dutch 1.5 English write.tree (upgma.tree,'upgmaExample.tre')

  27. Ultrametric trees If distances are not ultra-metric WBGT Phylogenetic trees I Gerhard Jäger 15 / 27 UPGMA algorithm also works with distances which are not tree topology may or may not be recovered ultra-metric in this case it will not recover the correct distances Gothic 0,5 1,5 English 1,5 1 Dutch 1 0,5 1 German 2,3 1,7 Italian 1,7 Spanish

  28. Ultrametric trees If distances are not ultra-metric Gerhard Jäger Phylogenetic trees I WBGT 16 / 27 taxa <- c ('German','Dutch','English', 'Spanish','Italian','Gothic') d <- as.dist ( matrix ( c (0,2,3,8,8,3, 2,0,3,8,8,3, 3,3,0,8,8,3, 8,8,8,0,3.4,6, 8,8,8,3.4,0,6, 3,3,3,6,6,0), byrow=T,nrow=6, dimnames= list (taxa,taxa)))

Recommend


More recommend