CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 2 Part 1 Mohammed El-Kebir January 28, 2020
Outline 1. Change problem 2. Review of running time analysis 3. Edit distance 4. Review elementary graph theory 5. Manhattan Tourist problem 6. Longest/shortest paths in DAGs Reading: โข Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4 โข Lecture notes 2
The Change Problem Change Problem: Given amount ๐ โ โ โ {0} and coins ๐ = ๐ ' , โฆ , ๐ * โ โ * s.t. ๐ * = 1 and ๐ 8 โฅ ๐ 8:' for all ๐ โ ๐ โ 1 = {1, โฆ , ๐ โ 1}, find ๐ = ๐ ' , โฆ , ๐ * โ โ * s.t. (i) ๐ = โ 8?' * * ๐ 8 ๐ 8 and (ii) โ 8?' ๐ 8 is minimum โข Suppose we have ๐ = 3 coins: 7 3 1 ๐ = ( ) , , cent cent cent โข What is the minimum number of coins needed to make change for ๐ = 9 cents? โข Answer: ๐ ' , โฆ , ๐ * = (1, 0, 2) thus 1 + 0 + 2 = 3 coins. 3
The Change Problem โ Four Algorithms GreedyChange( ๐, ๐ ' , โฆ , ๐ * ) ExhaustiveChange( ๐, ๐ ' , โฆ , ๐ * ) 1. for ๐ ร 1 to ๐ 1. for ( d 1 , . . . , d n ) 2 [ b M/c 1 c ] โฅ . . . โฅ [ b M/c n c ] P n 2. ๐ 8 ร 2. if b M/c i c i =1 c i d i = M 3. ๐ ร ๐ โ ๐ 8 ๐ 8 3. return ( d 1 , . . . , d n ) RecursiveChange( ๐, ๐ ' , โฆ , ๐ * ) DPChange( ๐, ๐ ' , โฆ , ๐ * ) 1. if ๐ = 0 1. for ๐ ร 1 to ๐ 2. return 0 2. minNumCoins[ ๐ ] ร โ 3. bestNumCoins ร โ 3. for ๐ ร 1 to ๐ 4. for ๐ ร 1 to ๐ 4. minNumCoins[ ๐ 8 ] ร 1 5. if ๐ โฅ ๐ 8 5. for ๐ ร 1 to ๐ 6. numCoins ร RecursiveChange( ๐ โ ๐ 8 , ๐ ' , โฆ , ๐ * ) 6. for ๐ ร 1 to ๐ 7. if numCoins + 1 < bestNumCoins 7. if ๐ > ๐ 8 8. bestNumCoins ร numCoins + 1 8. minNumCoins[ ๐ ] ร min( 1 + minNumCoins[ ๐ โ ๐ 8 ], minNumCoins[ ๐ ]) 9. return bestNumCoins 9. return minNumCoins[ M ] 4
Four Different Algorithms Technique Correct? Efficient? Greedy algorithm no yes [GreedyChange] Exhaustive enumeration yes no [ExhaustiveChange] Recursive algorithm yes no [RecursiveChange] Dynamic programming yes yes [DPChange] Question: How to assess efficiency? 5
Running Time Analysis โข The running time of an algorithm ๐ต for problem ฮ is the maximum number of steps that ๐ต will take on any instance of size ๐ = |๐| โข Asymptotic running time ignores constant factors using Big O notation g ( n ) ๐(๐) is ๐(๐ ๐ ) provided there f ( n ) exists ๐ > 0 and ๐ J โฅ 0 such that ๐ ๐ โค ๐ ๐(๐) for all ๐ โฅ ๐ J 6
Running Time Analysis โ Example ๐(๐) is ๐(๐ ๐ ) provided there exists ๐ > 0 and ๐ J โฅ 0 such that ๐ ๐ โค ๐ ๐(๐) for all ๐ โฅ ๐ J ๐ ๐ = 10000 + 500๐ M ๐ ๐ ๐ ๐ = ๐ N /2 1000 ๐ ๐ Pick ๐ = 1000 and ๐ J = 3 . Then, ๐(๐) โค ๐๐(๐) for all ๐ โฅ ๐ J . 7
The Change Problem โ Running Time Analysis GreedyChange( ๐, ๐ ' , โฆ , ๐ * ) Number of operations: 1. for ๐ ร 1 to ๐ โข Line 2: 3 = ๐(1) 2. ๐ 8 ร b M/c i c โข Line 3: 3 = ๐(1) 3. ๐ ร ๐ โ ๐ 8 ๐ 8 โข Total: 6๐ = ๐(๐) DPChange( ๐, ๐ ' , โฆ , ๐ * ) 1. for ๐ ร 1 to ๐ 2. minNumCoins[ ๐ ] ร โ Number of operations: 3. for ๐ ร 1 to ๐ โข Lines 1-2: ๐(๐) 4. minNumCoins[ ๐ 8 ] ร 1 โข Lines 3-4: ๐(๐) 5. for ๐ ร 1 to ๐ 6. for ๐ ร 1 to ๐ โข Lines 5-8: ๐(๐๐ ) 7. if ๐ > ๐ 8 โข Total: ๐(๐) + ๐(๐) + ๐(๐๐) = 8. minNumCoins[ ๐ ] ร min( 1 + ๐(๐๐) minNumCoins[ ๐ โ ๐ 8 ], minNumCoins[ ๐ ]) 9. return minNumCoins[ M ] 8
Running Time Analysis โ Guidelines โข ๐(๐ Q ) โ ๐(๐ S ) for any positive constants ๐ < ๐ โข For any constants ๐, ๐ > 0 and ๐ > 1 , ๐(๐) โ ๐(log ๐) โ ๐(๐ S ) โ ๐(๐ * ) โข We can multiply to learn about other functions. For any constants ๐, ๐ > 0 and ๐ > 1 , ๐ ๐๐ = ๐(๐) โ ๐(๐ log ๐) โ ๐ ๐ ๐ S = ๐(๐ S:' ) โ ๐(๐๐ * ) โข Base of the logarithm is a constant and can be ignored. For any constants ๐, ๐ > 1 , ๐ log Q ๐ = ๐(log S ๐/ log S ๐) = ๐(1/(log S ๐) log S ๐) = ๐(log S ๐) 9
Running Time Analysis โ Guidelines Big Oh Name ๐(1) Constant โข ๐(๐ Q ) โ ๐(๐ S ) for any positive constants ๐ < ๐ ๐(log ๐) Logarithmic ๐(๐) Linear โข For any constants ๐, ๐ > 0 and ๐ > 1 , ๐(๐ Z ) Quadratic ๐ ๐ [ = ๐(poly ๐ ) Polynomial ๐(๐) โ ๐(log ๐) โ ๐(๐ S ) โ ๐(๐ * ) ๐(2 ^_`a(*) ) Exponential โข We can multiply to learn about other functions. For any constants ๐, ๐ > 0 and ๐ > 1 , ๐ ๐๐ = ๐(๐) โ ๐(๐ log ๐) โ ๐ ๐ ๐ S = ๐(๐ S:' ) โ ๐(๐๐ * ) โข Base of the logarithm is a constant and can be ignored. For any constants ๐, ๐ > 1 , ๐ log Q ๐ = ๐(log S ๐/ log S ๐) = ๐(1/(log S ๐) log S ๐) = ๐(log S ๐) 10
Running Time Analysis โ More Examples * Question : What is ๐ ? b 11
Running Time Analysis โ More Examples * Question : What is ๐ ? b โข For constant ๐ > 0 it holds that * b = O(๐ b ) * โข Recall that ๐! = โ 8?' Question : What is ๐ ๐! ? ๐ 12
Running Time Analysis โ More Examples * Question : What is ๐ ? b โข For constant ๐ > 0 it holds that * b = O(๐ b ) * โข Recall that ๐! = โ 8?' Question : What is ๐ ๐! ? ๐ * * ij^ * ๐ * = ๐ ๐ * = ๐(2 * `_k * ) * Stirlingโs approximation: ๐! โ 2๐๐ = 2๐ i (*) (*) : ๐ / exp ๐ < 1 for all ๐ > 0 Question : Is ๐ * = ๐ ๐! ? 13
Running Time Analysis โ More Examples * Question : What is ๐ ? b โข For constant ๐ > 0 it holds that * b = O(๐ b ) * โข Recall that ๐! = โ 8?' Question : What is ๐ ๐! ? ๐ * * ij^ * ๐ * = ๐ ๐ * = ๐(2 * `_k * ) * Stirlingโs approximation: ๐! โ 2๐๐ = 2๐ i (*) (*) : ๐ / exp ๐ < 1 for all ๐ > 0 Question : Is ๐ * = ๐ ๐! ? Question : What is ๐ log(๐!) ? 14
Course Topic #1: Sequence Alignment โThus, although the FOXP2 protein is extremely conserved among mammals, it acquired two amino-acid changes on the human lineage, at least one of which may have functional consequences. This is an intriguing finding, because FOXP2 is the first gene known to be involved in the development of speech and language.โ Nature (2002) Question : How do we align sequences to identify similarities/differences? 15
Alignment An alignment between two strings v (of m characters) and w (of n characters) is a two row matrix where the first row contains the characters of v in order, the second row contains the characters of w in order, and spaces may be interspersed throughout each. Input Output v : KITTEN v : K - I T T E N - ( m = 6) w : SITTING w : ( n = 7) S I - T T I N G Question: Is this a good alignment? Answer: Count the number of insertion, deletions, substitutions. 16
Alignment An alignment between two strings v (of m characters) and w (of n characters) is a two row matrix where the first row contains the characters of v in order, the second row contains the characters of w in order, and spaces may be interspersed throughout each. Input Output v : KITTEN v : K - I T T E N - ( m = 6) w : SITTING w : ( n = 7) S I - T T I N G Question: Is this a good alignment? Answer: Count the number of insertion, deletions, substitutions. 17
Edit Distance [Levenshtein, 1966] Elementary operations : insertion, deletions and substitutions of single characters Edit Distance problem: Given strings ๐ฐ โ ฮฃ p and ๐ฑ โ ฮฃ * , compute the minimum number ๐(๐ฐ, ๐ฑ) of elementary operations to transform ๐ฐ into ๐ฑ . ๐ ๐๐๐ฎ, ๐๐๐ฌ = 1 ๐ ๐๐๐ฎ, ๐๐ฎ๐ = 2 ๐ ๐๐๐ฎ, ๐๐ฌ๐ = 3 18
Computing Edit Distance Edit Distance problem: Given strings ๐ฐ โ ฮฃ p and ๐ฑ โ ฮฃ * , compute the minimum number ๐(๐ฐ, ๐ฑ) of elementary operations to transform ๐ฐ into ๐ฑ . v : ATGTTAT... deletion insertion mismatch match w : AGCGTAC... ๐ โ 1 ๐ ๐ฐ 8 : A T - G T T T prefix of ๐ฐ of length ๐ ๐ฑ v : A G C G T - C prefix of ๐ฑ of length ๐ ๐ ๐ โ 1 Optimal substructure: Edit distance obtained from edit distance of prefix of string. 19
Computing Edit Distance โ Optimal Substructure ๐[๐, ๐] is the edit distance of ๐ฐ 8 and ๐ฑ v , where ๐ฐ 8 is prefix of ๐ฐ of length ๐ and ๐ฑ v is prefix of ๐ฑ of length ๐ ๐ฐ 8 Deletion: ๐ ๐, ๐ = ๐ ๐ โ 1, ๐ + 1 โฆ Extend by a character in ๐ฐ - โฆ - โฆ Insertion: ๐ ๐, ๐ = ๐ ๐, ๐ โ 1 + 1 Extend by a character in ๐ฑ ๐ฑ โฆ v ๐ฐ 8 Mismatch: ๐ ๐, ๐ = ๐ ๐ โ 1, ๐ โ 1 + 1 โฆ Extend by a character in ๐ฐ and ๐ฑ ๐ฑ โฆ v ๐ฐ 8 Match: ๐ ๐, ๐ = ๐ ๐ โ 1, ๐ โ 1 โฆ Extend by a character in ๐ฐ and ๐ฑ ๐ฑ โฆ v 20
Recommend
More recommend