one step mutation osm matrices
play

One Step Mutation (OSM) matrices joint work with Sequence Evolution - PDF document

One Step Mutation (OSM) matrices joint work with Sequence Evolution 1 Sequence Evolution acggcatagccgattac Sequence Evolution acgggatagcccattac acggcatagccgattac 2 Sequence Evolution acgggat--cccattac acggcatatccactggattac


  1. One Step Mutation (OSM) matrices joint work with Sequence Evolution 1

  2. Sequence Evolution acggcatagccgattac Sequence Evolution acgggatagcccattac acggcatagccgattac 2

  3. Sequence Evolution acgggat--cccattac acggcatatccactggattac acgggatagcccattac acggcatagccgattac Sequence Evolution acgggat--cccattac acgggat--cccattac acggcatatccactggattac acgggatagcccattac acggcatagccgattac 3

  4. acgacatatccactggattcc accccctatccactggattac Sequence Evolution ccgggatagcttccattac acgggat--cccaatac acgggat--cccattac acgggat--cccattac acggcatatccactggattac acgggatagcccattac acggcatagccgattac Multiple Sequence Alignment (MSA) seq 1 a g c t t a c c t g t t a c t seq 2 c g t a a a t t t c c c g a t seq 3 c g c a a g t t t c c c g a t seq 4 c a c t t a t t a g t c a a c Alignment column or alignment pattern 4

  5. Example: Binary Alphabet {R, Y} Binary Alphabet {R, Y} R R R R pattern 5

  6. Binary Alphabet {R, Y} R R R R pattern Binary Alphabet {R, Y} Y R R R pattern 6

  7. Binary Alphabet {R, Y} permutation matrix σ Α Binary Alphabet {R, Y} permutation matrix σ ΑΒ 7

  8. Internal branches matrix multiplication σ Α σ Β = σ ΑΒ One Step Mutation Matrix 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 8

  9. Examples of OSM-Graphs Branch Lengths: Total branch length � = d A + d B + d C + d D + d AB + d CD d A d B d C d D relative edge length d AB d CD p edge = d edge T � 9

  10. Some Formalisms: relative branch lengths p A + p B + p C + p D + p AB + p CD = 1 used to assign mutation probabilities p A p B p C p D p edge � � edge p AB p CD general permutation matrix Constructing the OSM: p A p B p C p D � M T = p edge � � edge edge p AB p CD 10

  11. Many Substitutions: � M T = p edge � � edge One substitution edge k � � k = � M T � p edge � � edge � k substitutions � � � � edge Many Substitutions: Random walk 11

  12. Maximum Parsimony (MP) k ( i , j ) > 0 k � N describes the minimal number of { } Min M mutations to move from pattern i to j T k MP: For a tree T and pattern j compute: k ( R ... R , j ) > 0 or M T k ( Y ... Y , j ) > 0 k � N { } Min M T k Maximum Likelihood We assume that the number of substitutions is Poisson distributed with parameter Δ . Then we compute, the expected OSM as ) � k ( M T ) k � ( exp �� � M T = k ! k = 0 M T = exp( �� ) � exp � � M T ( ) M T = exp( �� ) � H 2 n � exp � � D T ( ) � H 2 n D T = H 2 n � M T � H 2 n where H 2 = 1 � 1 � H 2 n = H 2 � H 2 � K � H 2 � � and 1 4 4 4 2 4 4 4 3 1 � 1 � � n times 12

  13. Maximum Likelihood ( ) � H 2 n M T = exp( �� ) � H 2 n � exp � � D T The likelihood of a tree T with branch length Δ , given an alignment of length L is then L � ( ) = ( ) Pr T , � M T { R ... R , Y ... Y }, pattern ( i ) i = 1 Another View at the Mutations M T = exp( �� ) � H 2 n � exp � � D T ( ) � H 2 n From the above formula, we can analytically compute the posterior probability of the number of mutations that have occurred on a fixed tree. k ( ) ) � k M T R , K , R ,pattern ( ) ) = exp �� ( Pr k mutations | pattern ( k ! M T R ..., R ,pattern ( ) similar work by Rasmus Nielsen, John Huelsenbeck, Jonathan Bollback (2002, 2003, 2005) 13

  14. Posterior probabilities:clock-like tree ] � k � 0 M T k 0 , a k 1 , a ( ) [ ( ) + � 1 M T ( ) exp �� ppd[ k | a ] = ( ) + � 1 M T 1 , a ( ) � 0 M T 0 , a Δ =1.0 0.2 Posterior probabilities: five Taxa Tree Pattern: AB|CDE alignment patterns Pattern ABE|CD 14

  15. Summary and Outlook Developed an evolutionary model that describes the action of a single substitution on an alignment pattern. This leads to a tree-topology mediated random walk on the space of words of length n . Maximum Parsimony and Maximum Likelihood are “extreme” cases within this framework. Practical Aspect: Analytical formula for the posterior probabilities of the number of substitutions for a pattern. Open Questions: •Connection between OSM and Hadamard transform (Hendy, Penny 1989) and its generalization, the Fourier calculus on evolutionary trees (Szekely, Steel, Erdös 1993). •Other type of substitution distributions? •Computational issues The real stuff 15

  16. The real stuff Observed pattern count ( ) O d 1 , K , d 4 n The real stuff Observed pattern count ( ) O d 1 , K , d 4 n Maximum likelihood etc. 16

  17. The real stuff Observed pattern count ( ) O d 1 , K , d 4 n Maximum likelihood etc. ( ) E p 1 , K , p 4 n ˆ T The real stuff Observed pattern count ( ) O d 1 , K , d 4 n Maximum likelihood etc. ( ) E p 1 , K , p 4 n OSM ˆ T 17

  18. The real stuff Observed pattern count ( ) O d 1 , K , d 4 n Maximum likelihood etc. ( ) E p 1 , K , p 4 n OSM ˆ T The real stuff Observed pattern count ( ) O d 1 , K , d 4 n Maximum How many mutations likelihood etc. are required to change E() into O()? ( ) E p 1 , K , p 4 n OSM ˆ T 18

Recommend


More recommend