outline
play

Outline CSCE CSCE 471/871 471/871 Lecture 5: Lecture 5: - PDF document

Outline CSCE CSCE 471/871 471/871 Lecture 5: Lecture 5: Building Building CSCE 471/871 Lecture 5: Phylogenetic Phylogenetic Trees Trees Building Phylogenetic Trees Stephen Scott Stephen Scott Phylogenetic trees Phylogenetic


  1. Outline CSCE CSCE 471/871 471/871 Lecture 5: Lecture 5: Building Building CSCE 471/871 Lecture 5: Phylogenetic Phylogenetic Trees Trees Building Phylogenetic Trees Stephen Scott Stephen Scott Phylogenetic trees Phylogenetic Phylogenetic Trees Trees Building trees from pairwise distances Building Trees Building Trees Stephen Scott Parsimony Parsimony Parsimony Hein’s Hein’s Simultaneous sequence alignment and phylogeny Algorithm Algorithm sscott@cse.unl.edu 1 / 26 2 / 26 Phylogenetic Trees Phylogenetic Trees (2) CSCE CSCE 471/871 471/871 Lecture 5: Lecture 5: Assumption: all organisms on Earth have a common Building Building Phylogenetic Phylogenetic ancestor Trees Trees ) all species are related in some way Stephen Scott Stephen Scott We’ll use binary trees, both rooted and unrooted Relationships represented by phyogenetic trees Phylogenetic Phylogenetic Trees Trees Trees can represent relationships between orthologs or Rooted for when we know the direction of evolution Building Trees Building Trees paralogs (i.e., the common ancestor) Parsimony Parsimony Othorlogs: Genes in different species that evolved from Can sometimes find the root by adding a distantly Hein’s Hein’s a common ancestral gene by speciation (evolution of Algorithm Algorithm related organism/sequence to an existing tree (Fig 7.1) one species out of another) Normally, orthologs retain the same function in the course of evolution Paralogs: genes related by duplication within a genome In contrast to orthologs, paralogs evolve new functions 3 / 26 4 / 26 Building Trees from Pairwise Distances Phylogenetic Trees (3) UPGMA CSCE CSCE Start with some distance measure between sequences, 471/871 471/871 Lecture 5: Lecture 5: e.g., Jukes-Cantor: Building Building Phylogenetic Phylogenetic A weighted tree, where each weight ( edge length ) is an Trees Trees d ij = � 0 . 75 log ( 1 � 4 f ij / 3 ) , estimate of evolutionary time between events Stephen Scott Stephen Scott Based on distance measure (e.g., substitution scoring where f ij is fraction of residues that differ between Phylogenetic Phylogenetic matrices) between sequences Trees Trees sequences x i and x j when pairwise aligned Gives a reasonably accurate approximation of relative Building Trees Building Trees evolutionary times, despite the fact that sequences can UPGMA Parsimony UPGMA (unweighted pair group method average) algorithm Neighbor Joining evolve at different rates Hein’s Parsimony Algorithm Number of possible binary trees on n nodes grows One of a family of hierarchical clustering algorithms Hein’s Algorithm exponentially in n Basic idea of algorithmic family: Find minimum E.g., n = 20 has about 2 . 2 ⇥ 10 20 trees inter-cluster distance d ij in current distance matrix, We’ll use hueristics, of course merge clusters i and j , then update distance matrix Differences among algorithms lie in matrix update For phylogenetic trees, also add edge lengths 5 / 26 6 / 26

  2. Building Trees from Pairwise Distances Building Trees from Pairwise Distances UPGMA (2) UPGMA (3) CSCE CSCE 8 i , assign seq x i to cluster C i and give it its own leaf, 1 471/871 471/871 Lecture 5: Lecture 5: with height 0 Building Building If the rate of evolution is the same at all points in Phylogenetic Phylogenetic While there are more than two clusters 2 Trees Trees original (target) phylogenetic tree, then UPGMA will Find minimum d ij in distance matrix Stephen Scott 1 Stephen Scott recover the correct tree Add to the clustering cluster C k = C i [ C j and delete C i 2 Phylogenetic Phylogenetic This occurs iff length of all paths from root to leaves are and C j Trees Trees equal in terms of evolutionary time For each cluster C ` 62 { C k , C i , C j } 3 Building Trees Building Trees UPGMA UPGMA If this is not the case, then UPGMA may find incorrect 1 Neighbor Joining Neighbor Joining X d k ` = d pq topology (Fig. 7.5, p. 170) Parsimony | C k | | C ` | Parsimony p ∈ C k , q ∈ C ` Hein’s Hein’s Can avoid this if distances satisfy ultrametric condition: Algorithm Algorithm [Shortcut: Eq. (7.2)] for any three sequences x i , x j , x k , the distances Add to the tree node k with children i and j , with height d ij , d jk , d ik are either all equal, or two are equal and one 4 d ij / 2 is smaller When only C i and C j remain, place root at height d ij / 2 3 Example: Fig 7.4 7 / 26 8 / 26 Building Trees from Pairwise Distances Building Trees from Pairwise Distances Neighbor Joining Neighbor Joining (2) CSCE If ultrametric property doesn’t hold, can still recover original CSCE 471/871 471/871 tree if additivity holds Lecture 5: Lecture 5: Building Building Phylogenetic Phylogenetic If, in original tree, distance between any pair of leaves = Trees Trees Initialize L = T = set of leaves 1 Stephen Scott sum of lengths of edges of path connecting them Stephen Scott While | L | > 2 2 Phylogenetic Phylogenetic If additivity holds, neighbor joining finds the original tree Choose i and j minimizing D ij 1 Trees Trees Define new node k and set d km = ( d im + d jm � d ij ) / 2 for 2 Building Trees Building Trees First, find a pair of neighboring leaves i and j , assign UPGMA UPGMA all m 2 L Neighbor Joining Neighbor Joining them parent k , then replace i and j with k , where for all Add k to T with edges of lengths d ik = ( d ij + r i � r j ) / 2 3 Parsimony Parsimony other leaves m , d km = ( d im + d jm � d ij ) / 2 and d jk = d ij � d ik Hein’s Hein’s Update L = { k } [ L \ { i , j } Algorithm But it does NOT work to simply choose pair ( i , j ) with Algorithm 4 minimum d ij (Fig. 7.7) Add final, length- d ij edge between final nodes i and j 3 Instead, choose ( i , j ) minimizing D ij = d ij � ( r i + r j ) , where L is current set of “leaves” and 1 X r i = d ik | L | � 2 k ∈ L 9 / 26 10 / 26 Parsimony Parsimony (2) CSCE CSCE Widely used approach for tree building 471/871 471/871 Lecture 5: Lecture 5: Scores tree based on the cost of substitutions going Building Building Phylogenetic Phylogenetic from node to its child Initialize k = 2 n � 1 (index of the root node) 1 Trees Trees ) Will assign hypothetical ancestral sequences to internal Recursively compute S k ( a ) for all a in the alphabet: 2 Stephen Scott Stephen Scott nodes, e.g., Figure 7.9 If k is a leaf, set S k ( a ) = 0 for a = x k u and S k ( a ) = 1 1 Phylogenetic Generally consists of two components Phylogenetic otherwise Trees Trees Computing cost of tree T over n aligned sequences 1 ⇒ a must match u th symbol in sequence Building Trees Building Trees Searching through the space of possible trees for 2 Parsimony Parsimony Else S k ( a ) = min b ( S i ( b ) + S ( a , b )) + min b ( S j ( b ) + S ( a , b )) , 2 min-cost one where i and j are k ’s children Hein’s Hein’s Algorithm Treat each site independently of the others, so for a Algorithm Return min a { S 2 n − 1 ( a ) } as minimum cost of tree 3 length- m alignment, run scoring algorithm on each of the m sites separately Can recover ancestral residues by tracking where min Let S ( a , b ) be cost of substituting b for a comes from in recurisve step Scoring site (tree) u 2 { 1 , . . . , m } , let S k ( a ) be the minimal cost for the assignment of symbol (residue) a to node k 11 / 26 12 / 26

Recommend


More recommend