sequence alignment
play

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence - PowerPoint PPT Presentation

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. Mark Voorhies


  1. Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment

  2. Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. Mark Voorhies Sequence Alignment

  3. Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. def s c o r e (S , x , y ) : ””” Return alignment s c o r e f o r subsequences x and y f o r s c o r i n g matrix S ( r e p r e s e n t e d as a d i c t ) ””” a s s e r t ( len ( x ) == len ( y )) sum (S [ i ] [ j ] ( i , j ) z i p ( x , y )) return f o r i n subseqs ( x , y , i ) : def ””” Return subsequences of x and y f o r o f f s e t i . ””” i f ( i > 0 ) : y = y [ i : ] e l i f ( i < 0 ) : x = x[ − i : ] L = min ( len ( x ) , len ( y )) return x [ : L ] , y [ : L ] def ungapped (S , x , y ) : ””” Return best o f f s e t , score , and alignment between sequences x and y f o r s c o r i n g matrix S ( r e p r e s e n t e d as a d i c t ) . ””” best = None b e s t s c o r e = None f o r i i n range ( − len ( x )+1 , len ( y ) ) : ( sx , sy ) = subseqs ( x , y , i ) s = s c o r e (S , sx , sy ) i f ( s > b e s t s c o r e ) : b e s t s c o r e = s best = i return best , b e s t s c o r e , subseqs ( x , y , best ) Mark Voorhies Sequence Alignment

  4. Dotplots Unbiased view of all ungapped 1 alignments of two sequences Mark Voorhies Sequence Alignment

  5. Dotplots Unbiased view of all ungapped 1 alignments of two sequences Noise can be filtered by applying a 2 smoothing window to the diagonals. Mark Voorhies Sequence Alignment

  6. Exercise: Scoring a gapped alignment 1 Given two equal length gapped sequences (where “-” represents a gap) and a scoring matrix, calculate an alignment score with a -1 penalty for each base aligned to a gap. 2 Write a new scoring function with separate penalties for opening a zero length gap ( e.g. , G = -11) and extending an open gap by one base ( e.g. , E = -1). gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ len ( i )) i Mark Voorhies Sequence Alignment

  7. How many ways can we align two sequences? Mark Voorhies Sequence Alignment

  8. How many ways can we align two sequences? Mark Voorhies Sequence Alignment

  9. How many ways can we align two sequences? Mark Voorhies Sequence Alignment

  10. How many ways can we align two sequences? Mark Voorhies Sequence Alignment

  11. How many ways can we align two sequences? Mark Voorhies Sequence Alignment

  12. How many ways can we align two sequences? Binomial formula: � k � k ! = r ( k − r )! r ! Mark Voorhies Sequence Alignment

  13. How many ways can we align two sequences? Binomial formula: � k � k ! = r ( k − r )! r ! � 2 n � = (2 n )! n ! n ! n Mark Voorhies Sequence Alignment

  14. How many ways can we align two sequences? Binomial formula: � k � k ! = r ( k − r )! r ! � 2 n � = (2 n )! n ! n ! n Stirling’s approximation: √ � x x + 1 � e − x x ! ≈ 2 π 2 Mark Voorhies Sequence Alignment

  15. How many ways can we align two sequences? Binomial formula: � k � k ! = r ( k − r )! r ! � 2 n � = (2 n )! n ! n ! n Stirling’s approximation: √ � x x + 1 � e − x x ! ≈ 2 π 2 ≈ 2 2 n � 2 n � √ π n n Mark Voorhies Sequence Alignment

  16. Dynamic Programming Mark Voorhies Sequence Alignment

  17. Needleman-Wunsch A G C G G T A G A G C G G A Mark Voorhies Sequence Alignment

  18. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 G -1 A -2 G -3 C -4 -5 G G -6 A -7 Mark Voorhies Sequence Alignment

  19. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 A - - G G -1 -1 A -2 A - A G G - G -3 C -4 -5 G G -6 A -7 Mark Voorhies Sequence Alignment

  20. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 G -1 -1 A -2 G -3 C -4 -5 G G -6 A -7 Mark Voorhies Sequence Alignment

  21. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 G -1 -1 0 A -2 G -3 C -4 -5 G G -6 A -7 Mark Voorhies Sequence Alignment

  22. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 G -1 -1 -1 0 A -2 G -3 C -4 -5 G G -6 A -7 Mark Voorhies Sequence Alignment

  23. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 G -1 -2 -1 -1 0 A -2 G -3 C -4 -5 G G -6 A -7 Mark Voorhies Sequence Alignment

  24. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 G -1 -2 -3 -1 -1 0 A -2 G -3 C -4 -5 G G -6 A -7 Mark Voorhies Sequence Alignment

  25. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 G -1 -2 -3 -4 -5 -1 -1 0 A -2 G -3 C -4 -5 G G -6 A -7 Mark Voorhies Sequence Alignment

  26. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 G -1 -2 -3 -4 -5 -1 -1 0 A -2 0 G -3 C -4 -5 G G -6 A -7 Mark Voorhies Sequence Alignment

  27. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 G -1 -2 -3 -4 -5 -1 -1 0 -1 -1 -2 -3 -4 -3 A -2 0 -1 -2 -3 G -3 -1 1 0 0 2 1 0 -1 -2 C -4 -2 0 -5 -3 -1 1 3 2 1 0 G G -4 -2 2 4 3 2 -6 0 A -7 -5 -3 -1 1 3 3 4 Mark Voorhies Sequence Alignment

  28. Needleman-Wunsch A G C G G T A 0 -1 -2 -3 -4 -5 -6 -7 G -1 -2 -3 -4 -5 -1 -1 0 -1 -1 -2 -3 -4 -3 A -2 0 -1 -2 -3 G -3 -1 1 0 0 2 1 0 -1 -2 C -4 -2 0 -5 -3 -1 1 3 2 1 0 G G -4 -2 2 4 3 2 -6 0 A -7 -5 -3 -1 1 3 3 4 Mark Voorhies Sequence Alignment

  29. Homework Implement Needleman-Wunsch global alignment with zero gap opening penalties. Try attacking the problem in this order: 1 Initialize and fill in a dynamic programming matrix by hand ( e.g. , try reproducing the example from my slides on paper). 2 Write a function to create the dynamic programming matrix and initialize the first row and column. 3 Write a function to fill in the rest of the matrix 4 Rewrite the initialize and fill steps to store pointers to the best sub-solution for each cell. 5 Write a backtrace function to read the optimal alignment from the filled in matrix. If that isn’t enough to keep you occupied, read the dynamic programming references from the class website. Try to articulate in your own words the logic for the speed-ups and trade-offs in the Myers and Miller approach. Mark Voorhies Sequence Alignment

Recommend


More recommend