Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de - PDF document

2/15/09 CSCI1950‐Z Computa3onal Methods for Biology Lecture 7 Ben Raphael February 9, 2009 hHp://cs.brown.edu/courses/csci1950‐z/ Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de change 2. Compu3ng likelihood of trees 1

2/15/09 Distances from Sequences convergent mul3ple Chimpanzee: CCTGCCAGTTAGCAAACGG G T Ancestor: CCCGCGACTTAACAAACGC G CCTGCGAGTTAACAAACGA Human: parallel back coincident Hamming distance = 3. D S = differences per site = 3/20 12 total muta3ons. Jukes‐Cantor Model A C G T A ‐3α α α α C α ‐3α α α Q = G α α ‐3α α T α α α ‐3α P(t) = e Qt p xy (t) = Pr[X t = x | X 0 = y] = ¼ + ¾ e ‐4αt if x = y ¼ ‐ ¼ e ‐4αt if x ≠ y 2

2/15/09 Observed Differences vs. Jukes‐Cantor = α t Other Models In biology, not all subs3tu3ons are equally likely. {A, G} purines Kimura 2 parameter model transversion A C G T A ‐2β‐ α β α β {C, T} pyrimidines C β ‐2β‐ α β α transi3on G α β ‐2β‐ α β T β α β ‐2β‐ α Other models • HKY model for DNA • Other models for protein sequences (20 x 20 matrices) 3

2/15/09 Probabilis3c Model y Pr[ x | y , t ] = probability that y mutates to x in 3me t t x Given a character matrix M, what is the “most likely” tree T that generated M? Rat: ACAGTGACGCCCCAAACGT Mouse: ACAGTGACGCTACAAACGT Gorilla: CCTGTGACGTAACAAACGA Chimpanzee: CCTGTGACGTAGCAAACGA CCTGTGACGTAGCAAACGA Human: Likelihood Given data D and a model (hypothesis) H for genera3on of D, the likelihood of H is the quan3ty L [ H ; D ] = Pr[ D | H ]. Note: likelihood of H is not the probability of H. 4

2/15/09 Probabilis3c Model y Pr[ x | y , t ] = probability that y mutates to x in 3me t t x Given a character matrix M, what is the “most likely” tree T that generated M? Assume: 1. Characters evolve independently. 2. Constant rate of muta3on on each branch. 3. State of a node depends only on parent and branch length: i.e. Pr[ x | y , t ] depends only on y and t . (Markov process) Maximum Likelihood Given a character matrix M , a tree T with branch lengths t * = t 1 , …, t 2 n ‐2 : L( T , t * ) = Pr[ M | T , t * ] is called the likelihood . Maximum likelihood : Find argmax T , t L( T , t * ) First: How to compute L(T, t * )? 5

2/15/09 Probabilis3c Model y Pr[ x | y , t ] = probability that y mutates to x in 3me t t x Given a tree (T, t * ) with leaves labeled by characters in M , Pr[ M | T , t * ] is the probability of a labeling of ancestral nodes. Assume: 1. Characters evolve independently: Pr[ M | T , t * ] = Π i Pr[ M i | T , t * ] so consider each character separately 2. Constant rate of muta3on on each branch. 3. State of a vertex depends only on parent and branch length: i.e. Pr[ x | y , t ] depends only on y and t . (Markov process) Example x t 5 t 6 y z t 2 t 3 G t 1 t 4 C A T 6

2/15/09 Probabilis3c Model y Pr[ x | y , t ] = probability that y t mutates to x in 3me t x Two species a T = tree topology t 1 t 2 x 1 , x 2 : characters for each species a : character for ancestor x 1 x 2 Pr[ x 1 , x 2 , a | T , t 1 , t 2 ] = q a Pr[ x 1 | a , t 1 ] Pr[ x 2 | a , t 2 ] q a = Pr[ ancestor has character a] Probabilis3c Model Two species a T = tree topology t 1 t 2 x 1 , x 2 : characters for each species a : character for ancestor x 1 x 2 Pr[ x 1 , x 2 , a | T , t 1 , t 2 ] = q a Pr[ x 1 | a , t 1 ] Pr[ x 2 | a , t 2 ] q a = Pr[ ancestor has character a] Pr[ x 1 , x 2 | T , t 1 , t 2 ] = Σ a q a Pr[ x 1 | a , t 1 ] Pr[ x 2 | a , t 2 ] Follows from Law of Total Probability: P( X ) = Σ P( X | Y i ) P( Y i ). 7

2/15/09 Probabilis3c Model n species: x 1 , x 2 , …, x n Let α( i ) = ancestor of node i . Let a n +1 , a n +2 , …, a 2 n ‐1 = characters on internal nodes, where nodes are number from internal ver3ces up to root. Pr [ x 1 , ..., x n | T, t 1 , ..., t 2 n − 2 ] = 2 n − 2 n � � Pr [ a i | a α ( i ) , t i ] � Pr [ x i | a α ( i ) , t i ] q a 2 n − 1 i = n +1 i =1 a n +1 ,a n +2 ,..,a 2 n − 1 Follows from Law of Total Probability: P( X ) = Σ P( X | Y i ) P( Y i ). Felsenstein’s Algorithm Let Pr[ T k | a ] = probability of leaf nodes “below” node k , given a k = a. a Compute via dynamic programming b c � � Pr [ T k | a ] = Pr [ b | a, t i ] Pr [ T i | b ] Pr [ c | a, t j ] Pr [ T j | c ] c b Ini3al condi3ons. For k = 1, …, n (leaf nodes) Pr[ T k | a ] = 1, if a = x k 0, otherwise. 8

2/15/09 Compu3ng the Likelihood Let Pr[ T k | a ] = probability of leaf nodes “below” node k , given a k = a. � Pr [ x 1 , . . . , x n | T, t ∗ ] = Pr [ T 2 n − 1 | a ] q a a Note: Root is node 2 n ‐1 Maximum Likelihood when T unknown Find T, t* that maximize: � Pr [ x 1 , . . . , x n | T, t ∗ ] = Pr [ T 2 n − 1 | a ] q a a Must search over all trees T. Complexity unknown un3l recently: – Felsenstein book (2004): “There has also been no proof that the problem is NP‐hard (as there has been for many other methods” – Shamir notes (2000): “[Maximum likelihood] not proven to be NP‐complete.” • ML is NP‐hard (B. Chor and T. Tuller, RECOMB 2005). 9

2/15/09 Unknown branch lengths • T fixed, branch lengths t * are unknown. • Use local op3miza3on rou3ne: e.g. Newton’s method or Expecta3on Maximiza3on Finding Ancestral States Let Pr[ T k | a ] = probability of best assignment of ancestral states to nodes “below” node k , given a k = a. � � � � Pr [ T k | a ] = max Pr [ b | a, t i ] Pr [ T i | b ] max Pr [ c | a, t j ] Pr [ T j | c ] c b Traceback as before with Sankoff’s algorithm. 10

2/15/09 Max. Parsimony vs. Max. Likelihood • Set δ ij = ‐log P( j | i ) in weighted parsimony (Sankoff algorithm) • Weighted parsimony produces “maximum probability” assignments, ignoring branch lengths Searching over Tree Space • How to find T with maximum likelihood? • How to find T with maximum parsimony? 11

Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de - PDF document

2/15/09 CSCI1950Z Computa3onal Methods for Biology Lecture 7 Ben Raphael February 9, 2009 hHp://cs.brown.edu/courses/csci1950z/ Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de change 2. Compu3ng likelihood of trees 1

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Community Development Services An Economic Development Consulting Firm 3895 Main Street,

Week 5: Manipulate, Facet, Reduce Demo: Text Tamara Munzner Department of Computer Science

MORE C Samira Khan Agenda Pointer vs array Using man page Structure and dynamic

WIDESPREAD AND SYSTEMIC VIOLATIONS: EUROPEAN SYSTEM P R O F E S S O R P H I L I P L E A C H E

Substitution = 1:A G 2:C A Mutation followed GAGATC by Fixation 3:G A 6:C T 5:T C

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabs Pczos & Aarti

The Habitable Zone (HZ) of our Solar System today Impact Frustration of the Origin of Life Earth

IEA Bioenergy Task 34 overview Direct Thermochemical Liquefaction Current status and next

Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de - PDF document

2/15/09 CSCI1950Z Computa3onal Methods for Biology Lecture 7 Ben Raphael February 9, 2009 hHp://cs.brown.edu/courses/csci1950z/ Outline Probabilis3c Models of Phylogeny 1. Models of nucleo3de change 2. Compu3ng likelihood of trees 1

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Community Development Services An Economic Development Consulting Firm 3895 Main Street,

Week 5: Manipulate, Facet, Reduce Demo: Text Tamara Munzner Department of Computer Science

MORE C Samira Khan Agenda Pointer vs array Using man page Structure and dynamic

WIDESPREAD AND SYSTEMIC VIOLATIONS: EUROPEAN SYSTEM P R O F E S S O R P H I L I P L E A C H E

Substitution = 1:A G 2:C A Mutation followed GAGATC by Fixation 3:G A 6:C T 5:T C

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabs Pczos &amp; Aarti

The Habitable Zone (HZ) of our Solar System today Impact Frustration of the Origin of Life Earth

IEA Bioenergy Task 34 overview Direct Thermochemical Liquefaction Current status and next

Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabs Pczos & Aarti