longest common subsequence
play

Longest Common Subsequence C=c 1 c g is a subsequence of A=a 1 a m - PDF document

Longest Common Subsequence C=c 1 c g is a subsequence of A=a 1 a m if C can be obtained by removing elements CSE 421 from A (but retaining order) Algorithms LCS(A, B): A maximum length sequence that is a subsequence of both A


  1. Longest Common Subsequence • C=c 1 …c g is a subsequence of A=a 1 …a m if C can be obtained by removing elements CSE 421 from A (but retaining order) Algorithms • LCS(A, B): A maximum length sequence that is a subsequence of both A and B Richard Anderson ocurranec Lecture 19 attacggct Longest Common Subsequence occurrence tacgacca Instructor Example Determine the LCS of the following String Alignment Problem strings • Align sequences with gaps CAT TGA AT BARTHOLEMEWSIMPSON CAGAT AGGA • Charge δ x if character x is unmatched KRUSTYTHECLOWN • Charge γ xy if character x is matched to character y Student Submission LCS Optimization Optimization recurrence • A = a 1 a 2 …a m If a j = b k , Opt[j,k] = 1 + Opt[j-1, k-1] • B = b 1 b 2 …b n If a j != b k , Opt[j,k] = max(Opt[j-1,k], Opt[j,k-1]) • Opt[j, k] is the length of LCS(a 1 a 2 …a j , b 1 b 2 …b k ) 1

  2. Give the Optimization Recurrence Dynamic Programming for the String Alignment Problem Computation • Charge δ x if character x is unmatched • Charge γ xy if character x is matched to character y Student Submission Write the code to compute Opt[j,k] Storing the path information A[1..m], B[1..n] b 1 …b n for i := 1 to m Opt[i, 0] := 0; for j := 1 to n Opt[0,j] := 0; Opt[0,0] := 0; a 1 …a m for i := 1 to m for j := 1 to n if A[i] = B[j] { Opt[i,j] := 1 + Opt[i-1,j-1]; Best[i,j] := Diag; } else if Opt[i-1, j] >= Opt[i, j-1] { Opt[i, j] := Opt[i-1, j], Best[i,j] := Left; } else { Opt[i, j] := Opt[i, j-1], Best[i,j] := Down; } Student Submission How good is this algorithm? Observations about the Algorithm • Is it feasible to compute the LCS of two • The computation can be done in O(m+n) strings of length 100,000 on a standard space if we only need one column of the desktop PC? Why or why not. Opt values or Best Values • The algorithm can be run from either end of the strings Student Submission 2

  3. Divide and Conquer Algorithm Constrained LCS • Where does the best path cross the • LCS i,j (A,B): The LCS such that middle column? – a 1 ,…,a i paired with elements of b 1 ,…,b j – a i+1 ,…a m paired with elements of b j+1 ,…,b n • LCS 4,3 (abbacbb, cbbaa) • For a fixed i, and for each j, compute the LCS that has a i matched with b j A = RRSSRTTRTS A = RRSSRTTRTS B=RTSRRSTST B=RTSRRSTST Compute LCS 5,1 (A,B), LCS 5,2 (A,B),…,LCS 5,9 (A,B) Compute LCS 5,1 (A,B), LCS 5,2 (A,B),…,LCS 5,9 (A,B) j left right 0 0 3 1 1 3 2 1 3 3 2 3 4 3 3 5 3 2 6 3 2 7 3 1 8 4 1 Student Instructor Submission 9 4 0 Example Computing the middle column Divide and Conquer • From the left, compute LCS(a 1 …a m/2 ,b 1 …b j ) • A = a 1 ,…,a m B = b 1 ,…,b n • From the right, compute LCS(a m/2+1 …a m ,b j+1 …b n ) • Find j such that • Add values for corresponding j’s – LCS(a 1 …a m/2 , b 1 …b j ) and – LCS(a m/2+1 …a m ,b j+1 …b n ) yield optimal solution • Recurse • Note – this is space efficient 3

  4. Prove by induction that Algorithm Analysis T(m,n) <= 2cmn • T(m,n) = T(m/2, j) + T(m/2, n-j) + cnm Instructor Example 4

Recommend


More recommend