cs cs 466 466 in introduct ctio ion t to b bio ioin
play

CS CS 466 466 In Introduct ctio ion t to B Bio ioin - PowerPoint PPT Presentation

CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 5 Mohammed El-Kebir February 4, 2020 Outline 1. Fitting alignment 2. Local alignment 3. Gapped alignment 4. BLOSUM scoring matrix Reading: Jones and


  1. CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 5 Mohammed El-Kebir February 4, 2020

  2. Outline 1. Fitting alignment 2. Local alignment 3. Gapped alignment 4. BLOSUM scoring matrix Reading: • Jones and Pevzner. Chapters 6.6-6.9 • Lecture notes 2

  3. NGS Characterized by Short Reads … CATTCAGTAG … … AGCCATTAG … … GGTAGTTAG … … GGTAAACTAG … … TATAATTAG … … CGTACCTAG … Genome 10-100’s million short reads Next-generation Millions -billions Short read : 100 nucleotides DNA sequencing nucleotides Allow for inexact matches due to: Human reference genome is 3,300,000,000 nucleotides, while a • Sequencing errors short read is 100 nucleotides. • Polymorphisms/mutations in Global sequence alignment will not reference genome work! Question : How to account for discrepancy between lengths of reference and short read? 3

  4. Fitting Alignment For short read alignment, we want to align complete short read 𝐰 ∈ Σ $ to substring of reference genome 𝐱 ∈ Σ & . Note that 𝑛 ≪ 𝑜 . 𝐰 ∈ Σ $ 𝐱 ∈ Σ & Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 4

  5. Fitting Alignment – Naive Approach Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 𝐰 ∈ Σ $ 𝐱 ∈ Σ & • Consider all contiguous non-empty substrings of 𝐱 , defined by 1 ≤ 𝑗 ≤ 𝑘 ≤ 𝑜 • How many? 5

  6. Fitting Alignment – Naive Approach Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 𝐰 ∈ Σ $ 𝐱 ∈ Σ & • Consider all contiguous non-empty substrings of 𝐱 , defined by 1 ≤ 𝑗 ≤ 𝑘 ≤ 𝑜 & • How many? Answer: 𝑜 + 2 • What are their total lengths? • What is the running time? 6

  7. Fitting Alignment – Dynamic Programming Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 0 T A C G G C  0 , if i = 0, 𝐰 \ 𝐱    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0,    A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } G G 7

  8. Fitting Alignment – Dynamic Programming Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 Start anywhere on first row 0 T A C G G C  0 , if i = 0, 𝐰 \ 𝐱    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0,    A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } End anywhere on last row G G 8

  9. Fitting Alignment – Dynamic Programming Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 Start anywhere on first row 0 T A C G G C  0 , if i = 0, 𝐰 \ 𝐱    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0,    A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } End anywhere on last row G Question : Let match score be 1, G mismatch/indel score be -1. What is 𝑡 ∗ ? 𝐰 - A - G G - Question : Same scores. What is optimal 𝐱 global alignment and score? T A C G G C 9

  10. Fitting Alignment – Dynamic Programming • Online: https://valiec.github.io/AlignmentVisualizer/index.html 0 T A C G G C  0 , if i = 0, 𝐰 \ 𝐱    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0,    A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } G Question : Let match score be 1, G mismatch/indel score be -1. What is 𝑡 ∗ ? 𝐰 - A - G G - Question : Same scores. What is optimal 𝐱 global alignment and score? T A C G G C 10

  11. Outline 1. Fitting alignment 2. Local alignment 3. Gapped alignment 4. BLOSUM scoring matrix Reading: • Jones and Pevzner. Chapters 6.6-6.9 • Lecture notes 11

  12. Local Alignment – Biological Motivation Proteins are composed of functional units called domains. Such domains may occur in different proteins even across species. SHKA ABL1 From Pfam database (http://pfam.sanger.ac.uk/) Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 12

  13. Global, Fitting and Local Alignment Global Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find alignment of 𝐰 and 𝐱 with maximum score. Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 13

  14. Local Alignment – Naive Algorithm Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 Brute force : 1. Generate all pairs (𝐰 4 , 𝐱 4 ) of substrings of 𝐰 and 𝐱 2. For each pair (𝐰 4 , 𝐱 4 ) , solve global alignment problem. Question : There are $ & 2 pairs of substrings. 2 But they have different lengths. What is the running time? 14

  15. Key Idea Global alignment : Start at (0,0) and end at (𝑛, 𝑜) • Local alignment : Start and end anywhere • 15

  16. Local Alignment Recurrence Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱  0 , if i = 0 and j = 0,    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0,  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if j > 0,    s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max i,j s [ i, j ] 16

  17. Local Alignment Recurrence Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 Start anywhere  0 , if i = 0 and j = 0,    s [ i − 1 , j ] + δ ( v i , − ) , if i > 0,  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if j > 0,    s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max i,j s [ i, j ] End anywhere Running time: 𝑃(𝑛𝑜) 17

  18. Local Alignment – Dynamic Programming • Online: https://valiec.github.io/AlignmentVisualizer/index.html 0 T A C G G C 𝐰 \ 𝐱 0  0 , if i = 0 and j = 0,   A  s [ i − 1 , j ] + δ ( v i , − ) , if i > 0,  s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if j > 0,    G s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0.  s ∗ = max i,j s [ i, j ] G 𝐰 G G Question : Let match score be 2, mismatch 𝐱 score be -2 and indel be -4. What is 𝑡 ∗ ? G G 18

Recommend


More recommend