Sequence alignment Correspondence between bases of two DNA - PowerPoint PPT Presentation

Sequence alignment Correspondence between bases of two DNA sequences, or between amino acids of two protein sequences Alignment":""2"x"k"matrix"("k" ≥ m,"n") n"="10 V""="ACCTGGTAAA matches 8 mismatches 1 m"="10 W"="ACTGCGTATA deletions 1 insertions 1 V A C C T G G T A A A W" A C T G C G T A T A

“Goodness” of alignments Given two sequences, there are many possible alignments ATTTTCCC distance=2 ATTTACGC ATTT-TCCC distance=3 ATTTA-CGC ATTTTCCC———————— distance=16 ————————ATTTACGC Edit distance : the total number of substitutions, insertions and deletions needed to transform one sequence to another

Manhattan tourist problem Imagine seeking a Source * * path (from source to sink) to travel * (only eastward and * * southward) with the * most number of * * attractions (*) in * the Manhattan grid * * * Sink

Recursive algorithm -> Dynamic programming Function MT ( n,m ) 1. x = MT(n-1,m)+ weight of the edge from (n-1,m) to (n,m) 2. y = MT(n,m-1)+ weight of the edge from (n,m-1) to (n,m) 3. return max{x,y} MT (x, y) returns the “most weighted” path from point (x, y) to the “sink”.

How to find the optimal path 0 1 2 3 source 1 2 5 0 1 3 8 • Start from Sink. i 5 3 10 $5 • Find which of the two $5 1 $5 1 5 4 13 8 edges gave the “max”. Take it. 3 5 $3 2 3 3 $5 2 • Repeat. 8 9 12 15 0 0 $5 1 0 0 0 3 8 9 9 16 S 3,3$ =/16

Recipe 1. Identify subproblems 2. Write down recursions 3. Make it dynamic-programming!

The edit distance problem A F G C D E Match A G Insertion_X C Insertion_Y D E F A-GCDEF AFGCDE-

Minimum Edit Distance For sequence X and Y

Optimal alignment match match

Complexity

Is the edit distance the best way? For sequence X and Y

Amino acids can share similar properties

Weighted edit distance • To generalize scoring for DNA/RNA, consider a 4x4 scoring matrix S . • In the case of an amino acid sequence alignment, the scoring matrix would be a 20x20 size. • The addition of d is to include the score for comparison of a gap character “-”. • Two questions: • (a) What should S be? • (b) How do we find optimal scoring alignment?

Weighted edit distance • To generalize scoring for DNA/RNA, consider a (4+1) x(4+1) scoring matrix S . • In the case of an amino acid sequence alignment, the scoring matrix would be a (20+1)x(20+1) size. • The addition of d is to include the score for comparison of a gap character “-”. • Two questions: • (a) What should S be? • (b) How do we find optimal scoring alignment? Traditionally, people tend to maximize the alignment score with a negative gap penalty score

BLOcks SUbstitution Matrix (BLOSUM) amino acids

BLOcks SUbstitution Matrix (BLOSUM)

Recursion for generalized edit distance Complexity?

Gap score/penalty

Affine gap penalty Question: How to develop an efficient dynamic programming algorithm for affine gap penalties?

Categories of pairwise alignments

Semi-global alignment

Local alignment: naive algorithm • Long run time O(n 4 ): - In the grid of size n x n there are n 2 vertices (i,j) that may serve as a source. - For each such vertex computing alignments from (i,j) to (i’,j’) takes O(n 2 ) time. • This can be remedied by allowing every point to be the starting point

Local alignment: Smith-Waterman algorithm Idea: start over from any entry!

Local alignment

Sequence alignment Correspondence between bases of two DNA - PowerPoint PPT Presentation

Sequence alignment Correspondence between bases of two DNA sequences, or between amino acids of two protein sequences Alignment":""2"x"k"matrix"("k" m,"n") n"="10

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/24/2012 Mark Voorhies Sequence Alignment Exercise:

CSE 421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Computational Biology Winter 2008 Sequence Alignment; DNA Replication 1 Sequence

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Truly Subcubic Algorithms for Language Edit Distance and RNA Folding via Fast Bounded-Difference

Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 associated

Approximation of RNA Multiple Structural Alignment Marcin Kubica 1 , Romeo Rizzi 2 , Stphane

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta Model Markus E. Nebel based on

Pattern matching and common structure inference in RNA (secondary) structures St ephane

Small RNAs and how to analyze them using sequencing Jakub

GENOME 541 Syllabus ! protein and DNA sequence analysis to Modeling and Searching

vs. 14.05.2012 | Eva Christina Ackermann and Barbara Drossel | Technische Universitt Darmstadt |

Sequence alignment Correspondence between bases of two DNA - PowerPoint PPT Presentation

Sequence alignment Correspondence between bases of two DNA sequences, or between amino acids of two protein sequences Alignment":""2"x"k"matrix"("k" m,"n") n"="10

Sequence Alignment Gerhard Jger ESSLLI 2016 Gerhard Jger Sequence Alignment ESSLLI 2016 1

This week CSE 527 Sequence alignment Computational Biology More sequence alignment

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Sequence Alignment (chapter 6) p The biological problem p Global alignment p Local alignment p

Sequence Alignment Mark Voorhies 5/20/2015 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 5/29/2013 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring

Sequence Alignment Mark Voorhies 4/24/2012 Mark Voorhies Sequence Alignment Exercise:

CSE 421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE 427 Computational Biology Winter 2008 Sequence Alignment; DNA Replication 1 Sequence

CSE 427 Comp Bio Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

CSE421 Algorithms Sequence Alignment 1 Sequence Alignment What Why A Dynamic Programming

Sequence Alignment (chapter 6) The biological problem l Global alignment l Local alignment l

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Truly Subcubic Algorithms for Language Edit Distance and RNA Folding via Fast Bounded-Difference

Pairwise RNA Edit Distance In the following: Sequences S 1 and S 2 associated

Approximation of RNA Multiple Structural Alignment Marcin Kubica 1 , Romeo Rizzi 2 , Stphane

On the Combinatorics of RNA Secondary Structures in a Polymer-Zeta Model Markus E. Nebel based on

Pattern matching and common structure inference in RNA (secondary) structures St ephane

Small RNAs and how to analyze them using sequencing Jakub

GENOME 541 Syllabus ! protein and DNA sequence analysis to Modeling and Searching

vs. 14.05.2012 | Eva Christina Ackermann and Barbara Drossel | Technische Universitt Darmstadt |

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or