Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence - PowerPoint PPT Presentation

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment

Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. Mark Voorhies Sequence Alignment

Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. def s c o r e (S , x , y ) : a s s e r t ( len ( x ) == len ( y )) s = 0 f o r ( i , j ) i n z i p ( x , y ) : s += S [ i ] [ j ] return s Mark Voorhies Sequence Alignment

Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. def subseqs ( x , y , i ) : i f ( i > 0 ) : y = y [ i : ] e l i f ( i < 0 ) : x = x[ − i : ] L = min ( len ( x ) , len ( y )) return x [ : L ] , y [ : L ] Mark Voorhies Sequence Alignment

Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. def alignment ( x , y , i ) : i f ( i > 0 ) : x = ” − ” ∗ i+x e l i f ( i < 0 ) : y = ” − ” ∗ ( − i )+y L = len ( y ) − len ( x ) i f (L > 0 ) : x += ” − ” ∗ L e l i f (L < 0 ) : y += ” − ” ∗ ( − L) x , y return Mark Voorhies Sequence Alignment

Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. def ungapped (S , x , y ) : best = None b e s t s c o r e = None f o r i i n range ( − len ( x )+1 , len ( y ) ) : ( sx , sy ) = subseqs ( x , y , i ) s = s c o r e (S , sx , sy ) i f (( b e s t s c o r e i s None ) or ( s > b e s t s c o r e ) ) : b e s t s c o r e = s best = i return best , b e s t s c o r e , alignment ( x , y , best ) Mark Voorhies Sequence Alignment

Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. def ungapped (S , x , y ) : best = None b e s t s c o r e = None f o r i i n range ( − len ( x )+1 , len ( y ) ) : ( sx , sy ) = subseqs ( x , y , i ) s = s c o r e (S , sx , sy ) i f (( b e s t s c o r e None ) ( s > b e s t s c o r e ) ) : i s or b e s t s c o r e = s best = i best , b e s t s c o r e , alignment ( x , y , best ) return Mark Voorhies Sequence Alignment

Exercise: Scoring a gapped alignment Write a new scoring function with separate penalties for opening a zero length gap ( e.g. , G = -11) and extending an open gap by one base ( e.g. , E = -1). gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ len ( i )) i Mark Voorhies Sequence Alignment

Exercise: Scoring a gapped alignment gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ len ( i )) i Mark Voorhies Sequence Alignment

Exercise: Scoring a gapped alignment gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ len ( i )) i gapped score ( seq1 , seq2 , def s , g = 0 , e = − 1): gap = None s c o r e = 0 p a i r z i p ( seq1 , seq2 ) : f o r i n a s s e r t ( p a i r != ( ” − ” , ” − ” )) t r y : curgap = p a i r . index ( ” − ” ) except ValueError : s c o r e += s [ p a i r [ 0 ] ] [ p a i r [ 1 ] ] gap = None e l s e : i f ( gap != curgap ) : s c o r e += g gap = curgap s c o r e += e return s c o r e Mark Voorhies Sequence Alignment

Exercise: Scoring a gapped alignment gaps � S gapped ( x , y ) = S ( x , y ) + ( G + E ∗ len ( i )) i def gapped score ( seq1 , seq2 , s , g = 0 , e = − 1): def gapped score ( seq1 , seq2 , gap = None s , g = 0 , e = − 1): s c o r e = 0 gap = None f o r ( c1 , c2 ) i n z i p ( seq1 , seq2 ) : s c o r e = 0 i f ( ( c1 == ” − ” ) and ( c2 == ” − ” ) ) : f o r p a i r i n z i p ( seq1 , seq2 ) : r a i s e ValueError a s s e r t ( p a i r != ( ” − ” , ” − ” )) e l i f ( c1 == ” − ” ) : t r y : i f ( gap != 1 ) : curgap = p a i r . index ( ” − ” ) s c o r e += g ValueError : gap = 1 except s c o r e += s [ p a i r [ 0 ] ] [ p a i r [ 1 ] ] s c o r e += e gap = None e l i f ( c2 == ” − ” ) : e l s e : i f ( gap != 2 ) : i f ( gap != curgap ) : s c o r e += g s c o r e += g gap = 2 gap = curgap s c o r e += e s c o r e += e e l s e : s c o r e s c o r e += s [ c1 ] [ c2 ] return gap = None return s c o r e Mark Voorhies Sequence Alignment

How many ways can we align two sequences? Mark Voorhies Sequence Alignment

How many ways can we align two sequences? Binomial formula: � k � k ! = r ( k − r )! r ! Mark Voorhies Sequence Alignment

How many ways can we align two sequences? Binomial formula: � k � k ! = r ( k − r )! r ! � 2 n � = (2 n )! n n ! n ! Mark Voorhies Sequence Alignment

How many ways can we align two sequences? Binomial formula: � k � k ! = r ( k − r )! r ! � 2 n � = (2 n )! n n ! n ! Stirling’s approximation: √ � x x + 1 � e − x x ! ≈ 2 π 2 Mark Voorhies Sequence Alignment

How many ways can we align two sequences? Binomial formula: � k � k ! = r ( k − r )! r ! � 2 n � = (2 n )! n n ! n ! Stirling’s approximation: √ � x x + 1 � e − x x ! ≈ 2 π 2 ≈ 2 2 n � 2 n � √ π n n Mark Voorhies Sequence Alignment

Dynamic Programming Mark Voorhies Sequence Alignment

Needleman-Wunsch Mark Voorhies Sequence Alignment

Smith-Waterman The implementation of local alignment is the same as for global alignment, with a few changes to the rules: Initialize edges to 0 (no penalty for starting in the middle of a sequence) The maximum score is never less than 0, and no pointer is recorded unless the score is greater than 0 (note that this implies negative scores for gaps and bad matches) The trace-back starts from the highest score in the matrix and ends at a score of 0 (local, rather than global, alignment) Because the naive implementation is essentially the same, the time and space requirements are also the same. Mark Voorhies Sequence Alignment

Smith-Waterman A G C G G T A 0 0 0 0 0 0 0 0 G 0 1 0 0 0 0 1 0 0 1 A 1 0 0 0 0 0 1 0 0 G 0 0 2 1 1 3 2 1 0 0 C 0 0 1 0 2 4 3 2 1 G 0 0 G 0 3 5 4 3 0 1 1 A 0 1 0 0 2 4 4 5 Mark Voorhies Sequence Alignment

Final Homework Implement Needleman-Wunsch global alignment with zero gap opening penalties. Try attacking the problem in this order: 1 Initialize and fill in a dynamic programming matrix by hand ( e.g. , try reproducing the example from my slides on paper). 2 Write a function to create the dynamic programming matrix and initialize the first row and column. 3 Write a function to fill in the rest of the matrix 4 Rewrite the initialize and fill steps to store pointers to the best sub-solution for each cell. 5 Write a backtrace function to read the optimal alignment from the filled in matrix. If that isn’t enough to keep you occupied, try implementing Smith-Waterman local alignment and/or non-zero gap opening penalties. Mark Voorhies Sequence Alignment

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence - PowerPoint PPT Presentation

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. Mark Voorhies

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/24/2012 Mark Voorhies Sequence Alignment Exercise:

CSE 421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Computational Biology Winter 2008 Sequence Alignment; DNA Replication 1 Sequence

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence Labeling, Contd Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 2,

Sequence Diagrams SWEN-261 Introduction to Software Engineering Department of Software

CS 61A Lecture 10 Optional Hog Contest due Wednesday 2/18 @ 11:59pm Friday, February 13 2

Reflections on the MPI Process Steven Huss-Lederman September 25, 2017 Celebrating 25 Years of

Datapaths and Control Digital systems perform sequences of operations on encoded data

Testbench - -- interface entity reg32 is port (CLK, rst_n, load : in std_logic; D : in

A deviation of CURAND: standard pseudorandom number generator in CUDA for GPGPU Mutsuo Saito 1 ,

Boyun Jang boyunj0226@skku.edu Dept. of Artificial Intelligence Sungkyunkwan University, Korea

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence - PowerPoint PPT Presentation

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. Mark Voorhies

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/24/2012 Mark Voorhies Sequence Alignment Exercise:

CSE 421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Computational Biology Winter 2008 Sequence Alignment; DNA Replication 1 Sequence

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence Labeling, Contd Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 2,

Sequence Diagrams SWEN-261 Introduction to Software Engineering Department of Software

CS 61A Lecture 10 Optional Hog Contest due Wednesday 2/18 @ 11:59pm Friday, February 13 2

Reflections on the MPI Process Steven Huss-Lederman September 25, 2017 Celebrating 25 Years of

Datapaths and Control Digital systems perform sequences of operations on encoded data

Testbench - -- interface entity reg32 is port (CLK, rst_n, load : in std_logic; D : in

A deviation of CURAND: standard pseudorandom number generator in CUDA for GPGPU Mutsuo Saito 1 ,

Boyun Jang boyunj0226@skku.edu Dept. of Artificial Intelligence Sungkyunkwan University, Korea

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or