2/25/09 CSCI1950‐Z Computa3onal Methods for Biology Lecture 9 Ben Raphael February 23, 2009 hHp://cs.brown.edu/courses/csci1950‐z/ Outline Searching Through trees 1. Branch‐swapping: NNI, SPR, TBR. 2. MCMC Consensus Trees and Supertrees 1
2/25/09 Heuris3c Search 1. Start with an arbitrary tree T. 2. Check “neighbors” of T . 3. Move to a neighbor if it provides the best improvement in parsimony/likelihood score. Caveats: Could be stuck in local op3mum, and not achieve global op3mum Trees and Splits Given a set X, a split is a par33on of X into two non‐ empty subsets A and B such that X = A | B. For a phylogene3c tree T with leaves L , each edge e defines a split L e = A | B , where A and B are the leaves in the subtrees obtained by removing e . e A B 2
2/25/09 Compu3ng the Splits Metric A phylogene3c tree T defines a collec3on of splits Σ(T) = { L e | e is edge in T}. Theorem : ρ( T 1 , T 2 ) = | Σ( T 1 ) \ Σ( T 2 ) | + |Σ(T 2 ) \ Σ(T 1 ) | = |Σ(T 1 )| + |Σ(T 2 )| ‐ 2 |Σ( T 1 ) ∩ Σ( T 2 )| Proof: (whiteboard) Nota3on: A \ B = {x: x ∈ A, x ∉ B} Nearest Neighbor Interchange Rearrange four subtrees defined by one internal edge Claim : The number of NNI neighbors of a binary tree is 2(n‐3) Proof: (whiteboard) 3
2/25/09 Subtree Pruning and Regrafing (SPR) 1. Remove a branch. 2. Reconnect incident vertex by subdividing a branch Subtree Pruning and Regrafing (SPR) 1. Remove a branch. 2. Reconnect incident vertex by subdividing a branch Claim : The number of SPR neighbors of a binary tree is 2(n‐3) (2n – 7) Proof: (whiteboard) 4
2/25/09 Tree Bisec3on and Reconnec3on (TBR) 1. Remove a branch. 2. Reconnect subtrees by adding new branch that subdivides branches in both. Rela3onship between Opera3ons • Every NNI is an SPR and every SPR is a TBR. • Every TBR is a single SPR or a composi3on of two SPR. • All three types of opera3ons are inver3ble: If T T’, then T’ T. α α ‐1 Theorem : For all T and T ’ in B ( n ), there is a sequence of NNI (or SPR or TBR) opera3ons that transform T into T ’. 5
2/25/09 Rela3onship between Opera3ons NNI SPR TBR • Every NNI is an SPR and every SPR is a TBR. • Every TBR is a single SPR or a composi3on of two SPR. • All three types of opera3ons are inver3ble: If T T’, then T’ T. Heuris3c Search 1. Start with an arbitrary tree T. 2. Check “neighbors” of T . 3. Move to a neighbor if it provides the best improvement in parsimony/likelihood score. PAUP* (widely used phylogene3c package) includes command: hsearch nreps = num swap = type Where type = NNI , SPR , TBR 6
2/25/09 From Likelihood to Bayesian Given data X = ( x 1 , …, x n ), we found the tree T and branch lengths t * that maximized likelihood Pr[X | T, t * ]. What about other trees? Could we compute Pr[T, t * | X]? Back to Coin Flipping Flip coin with p = Pr[heads] unknown. Earlier we computed max. likelihood es3mate of p . L(p) = Pr[ D | p]. Pr[p | D] = Pr[ p, D]/Pr[D] = Pr[D|p]Pr[p] / Pr[D] 11 tosses 44 tosses 5 heads 20 heads Posterior Prior 7
2/25/09 Bayesian Methods Pr[T, t * | X] = Pr[X, T, t * ] / Pr[X] = Pr[X | T, t * ] Pr[T, t * ] / Pr[X] = Pr[X | T, t * ] Pr[T, t * ] / (Σ T’, t’ Pr[X | T’, t’] Pr[T’, t’] Bayes Theorem Posterior Prior Problem : Cannot compute denominator. Bayesian Methods Pr[T, t * | X] = Pr[X, T, t * ] / Pr[X] = Pr[X | T, t * ] Pr[T, t * ] / Pr[X] = Pr[X | T, t * ] Pr[T, t * ] / (Σ T’, t’ Pr[X | T’, t’] Pr[T’, t’] Bayes Theorem Posterior Prior Problem : Cannot compute denominator. Solu2on: Use power of Markov Chains to draw trees (“sample”) according to distribu3on Pr[T, t * | X] 8
2/25/09 Markov Chain Monte Carlo To sample from a distribu3on Define a Markov chain with equilibrium distribu3on π. Simulate chain through many transi3ons. Afer many transi3ons (e.g. ~10000), will be at equilibrium π. (“Burn‐in”) Output every n ‐th state. (n ~ 50). Jukes‐Cantor model of DNA A C Equilibrium distribu3on: q A = q C = q G = q T = 1/4 T G MCMC on Trees 1. Define a Markov chain: States are trees T . • • Equilibrium distribu3on is posterior Pr[T, t * | X]. 2. Simulate Markov chain for many steps (burn‐ in). 3. Output T from every n‐th (e.g. n = 50) step. NNI neighborhood for trees with 5 leaves 9
2/25/09 MCMC on Trees 1. Define a Markov chain: States are trees T . • • Equilibrium distribu3on is posterior Pr[T, t * | X]. 2. Simulate Markov chain for many steps (burn‐ in). 3. Output T from every n‐th (e.g. n = 50) step. For transi3ons, can use NNI, SPR, TBR, or other opera3ons. Can define* the transi3on probabili3es of this Markov chain without compu3ng Z = (Σ T’, t’ Pr[X | NNI neighborhood for trees with T’, t’] Pr[T’, t’] ( Metropolis algorithm ). 5 leaves *“involves burning of incense, cas3ng of chicken bones, use of magical incanta3ons, and invoking the opinions of more pres3gious colleagues.” ‐‐Felsenstein How Many Times Did Wings Evolve? • Previous studies had shown loss of wings: winged wingless transi3ons • Gain of wings (Wingless winged transi3on) appears to be much more complicated 10
2/25/09 Phylogeny of Insects ( Nature 2003) Build phylogeny of winged and wingless s3ck insects Used data from: 18S ribosomal DNA (~1,900 base pairs (bp)) 28S rDNA (2,250 bp) Por3on of histone 3 (H3, 372 bp) Used mul3ple tree reconstruc3on techniques Most Parsimonious Evolu3onary Tree of Winged and Wingless Insects • All most parsimonious reconstruc3on gave a wingless ancestor • All required mul3ple winged wingless transi3ons. 11
2/25/09 Most Parsimonious Evolu3onary Tree of Winged and Wingless Insects Will Wingless Insects Fly Again? • All most parsimonious reconstruc3ons all required the re‐inven3on of wings. • It is likely that wing developmental pathways are conserved in wingless s3ck insects 12
2/25/09 Next Ques3ons • How to combine/merge trees? • How to determine “confidence” in a par3cular tree/branch? Mul3ple Trees? 13
2/25/09 Consensus Trees Strict Consensus Tree 14
2/25/09 Strict Consensus No non‐trivial splits in common! Strict consensus tree is unresolved. Splits Equivalence Theorem A phylogene3c tree T defines a collec3on of splits Σ(T) = { L e | e is edge in T}. Splits A 1 | B 1 and A 2 | B 2 are pairwise compa.ble if at least one of A 1 ∩ A 2 , A 1 ∩ B 2 , B 1 ∩ A 2 , and B 1 ∩ B 2 is the empty set. Splits Equivalence Theorem : Let Σ be a collec3on of splits. There is a phylogene3c tree such that Σ(T) = Σ if and only if the splits in Σ are pairwise compa3ble. The Pairwise Compa3bility Theorem (for binary characters) follows from this theorem. 15
2/25/09 Majority Consensus Tree Majority Consensus Tree 16
Recommend
More recommend