CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 5 Mohammed El-Kebir February 4, 2020
Outline 1. Fitting alignment 2. Local alignment 3. Gapped alignment 4. BLOSUM scoring matrix Reading: • Jones and Pevzner. Chapters 6.6-6.9 • Lecture notes 2
NGS Characterized by Short Reads … CATTCAGTAG … … AGCCATTAG … … GGTAGTTAG … … GGTAAACTAG … … TATAATTAG … … CGTACCTAG … Genome 10-100’s million short reads Next-generation Millions -billions Short read : 100 nucleotides DNA sequencing nucleotides Allow for inexact matches due to: Human reference genome is 3,300,000,000 nucleotides, while a • Sequencing errors short read is 100 nucleotides. • Polymorphisms/mutations in Global sequence alignment will not reference genome work! Question : How to account for discrepancy between lengths of reference and short read? 3
Fitting Alignment For short read alignment, we want to align complete short read 𝐰 ∈ Σ $ to substring of reference genome 𝐱 ∈ Σ & . Note that 𝑛 ≪ 𝑜 . 𝐰 ∈ Σ $ 𝐱 ∈ Σ & Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 4
Fitting Alignment – Naive Approach Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 𝐰 ∈ Σ $ 𝐱 ∈ Σ & • Consider all contiguous non-empty substrings of 𝐱 , defined by 1 ≤ 𝑗 ≤ 𝑘 ≤ 𝑜 • How many? 5
Fitting Alignment – Naive Approach Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 𝐰 ∈ Σ $ 𝐱 ∈ Σ & • Consider all contiguous non-empty substrings of 𝐱 , defined by 1 ≤ 𝑗 ≤ 𝑘 ≤ 𝑜 & • How many? Answer: 𝑜 + 2 • What are their total lengths? • What is the running time? 6
Fitting Alignment – Dynamic Programming Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 0 T A C G G C 0 , if i = 0, 𝐰 \ 𝐱 s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0 s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0, A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0. s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } G G 7
Fitting Alignment – Dynamic Programming Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 Start anywhere on first row 0 T A C G G C 0 , if i = 0, 𝐰 \ 𝐱 s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0 s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0, A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0. s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } End anywhere on last row G G 8
Fitting Alignment – Dynamic Programming Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 Start anywhere on first row 0 T A C G G C 0 , if i = 0, 𝐰 \ 𝐱 s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0 s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0, A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0. s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } End anywhere on last row G Question : Let match score be 1, G mismatch/indel score be -1. What is 𝑡 ∗ ? 𝐰 - A - G G - Question : Same scores. What is optimal 𝐱 global alignment and score? T A C G G C 9
Fitting Alignment – Dynamic Programming • Online: https://valiec.github.io/AlignmentVisualizer/index.html 0 T A C G G C 0 , if i = 0, 𝐰 \ 𝐱 s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, 0 s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if i > 0 and j > 0, A s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0. s ∗ = max { s [ m, 0] , . . . , s [ m, n ] } G Question : Let match score be 1, G mismatch/indel score be -1. What is 𝑡 ∗ ? 𝐰 - A - G G - Question : Same scores. What is optimal 𝐱 global alignment and score? T A C G G C 10
Outline 1. Fitting alignment 2. Local alignment 3. Gapped alignment 4. BLOSUM scoring matrix Reading: • Jones and Pevzner. Chapters 6.6-6.9 • Lecture notes 11
Local Alignment – Biological Motivation Proteins are composed of functional units called domains. Such domains may occur in different proteins even across species. SHKA ABL1 From Pfam database (http://pfam.sanger.ac.uk/) Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 12
Global, Fitting and Local Alignment Global Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find alignment of 𝐰 and 𝐱 with maximum score. Fitting Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find an alignment of 𝐰 and a substring of 𝐱 with maximum global alignment score 𝑡 ∗ among all global alignments of 𝐰 and all substrings of 𝐱 Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 13
Local Alignment – Naive Algorithm Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 Brute force : 1. Generate all pairs (𝐰 4 , 𝐱 4 ) of substrings of 𝐰 and 𝐱 2. For each pair (𝐰 4 , 𝐱 4 ) , solve global alignment problem. Question : There are $ & 2 pairs of substrings. 2 But they have different lengths. What is the running time? 14
Key Idea Global alignment : Start at (0,0) and end at (𝑛, 𝑜) • Local alignment : Start and end anywhere • 15
Local Alignment Recurrence Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 0 , if i = 0 and j = 0, s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if j > 0, s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0. s ∗ = max i,j s [ i, j ] 16
Local Alignment Recurrence Local Alignment problem: Given strings 𝐰 ∈ Σ $ and 𝐱 ∈ Σ & and scoring function 𝜀 , find a substring of 𝐰 and a substring of 𝐱 whose alignment has maximum global alignment score 𝑡 ∗ among all global alignments of all substrings of 𝐰 and 𝐱 Start anywhere 0 , if i = 0 and j = 0, s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if j > 0, s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0. s ∗ = max i,j s [ i, j ] End anywhere Running time: 𝑃(𝑛𝑜) 17
Local Alignment – Dynamic Programming • Online: https://valiec.github.io/AlignmentVisualizer/index.html 0 T A C G G C 𝐰 \ 𝐱 0 0 , if i = 0 and j = 0, A s [ i − 1 , j ] + δ ( v i , − ) , if i > 0, s [ i, j ] = max s [ i, j − 1] + δ ( − , w j ) , if j > 0, G s [ i − 1 , j − 1] + δ ( v i , w j ) , if i > 0 and j > 0. s ∗ = max i,j s [ i, j ] G 𝐰 G G Question : Let match score be 2, mismatch 𝐱 score be -2 and indel be -4. What is 𝑡 ∗ ? G G 18
Recommend
More recommend