cs cs 466 466 in introduct ctio ion t to b bio ioin
play

CS CS 466 466 In Introduct ctio ion t to B Bio ioin - PowerPoint PPT Presentation

CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 2 Part 1 Mohammed El-Kebir January 28, 2020 Outline 1. Change problem 2. Review of running time analysis 3. Edit distance 4. Review elementary graph


  1. CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 2 Part 1 Mohammed El-Kebir January 28, 2020

  2. Outline 1. Change problem 2. Review of running time analysis 3. Edit distance 4. Review elementary graph theory 5. Manhattan Tourist problem 6. Longest/shortest paths in DAGs Reading: โ€ข Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4 โ€ข Lecture notes 2

  3. The Change Problem Change Problem: Given amount ๐‘ โˆˆ โ„• โˆ– {0} and coins ๐ = ๐‘‘ ' , โ€ฆ , ๐‘‘ * โˆˆ โ„• * s.t. ๐‘‘ * = 1 and ๐‘‘ 8 โ‰ฅ ๐‘‘ 8:' for all ๐‘— โˆˆ ๐‘œ โˆ’ 1 = {1, โ€ฆ , ๐‘œ โˆ’ 1}, find ๐ž = ๐‘’ ' , โ€ฆ , ๐‘’ * โˆˆ โ„• * s.t. (i) ๐‘ = โˆ‘ 8?' * * ๐‘‘ 8 ๐‘’ 8 and (ii) โˆ‘ 8?' ๐‘’ 8 is minimum โ€ข Suppose we have ๐‘œ = 3 coins: 7 3 1 ๐ = ( ) , , cent cent cent โ€ข What is the minimum number of coins needed to make change for ๐‘ = 9 cents? โ€ข Answer: ๐‘’ ' , โ€ฆ , ๐‘’ * = (1, 0, 2) thus 1 + 0 + 2 = 3 coins. 3

  4. The Change Problem โ€“ Four Algorithms GreedyChange( ๐‘, ๐‘‘ ' , โ€ฆ , ๐‘‘ * ) ExhaustiveChange( ๐‘, ๐‘‘ ' , โ€ฆ , ๐‘‘ * ) 1. for ๐‘— รŸ 1 to ๐‘œ 1. for ( d 1 , . . . , d n ) 2 [ b M/c 1 c ] โ‡ฅ . . . โ‡ฅ [ b M/c n c ] P n 2. ๐‘’ 8 รŸ 2. if b M/c i c i =1 c i d i = M 3. ๐‘ รŸ ๐‘ โˆ’ ๐‘’ 8 ๐‘‘ 8 3. return ( d 1 , . . . , d n ) RecursiveChange( ๐‘, ๐‘‘ ' , โ€ฆ , ๐‘‘ * ) DPChange( ๐‘, ๐‘‘ ' , โ€ฆ , ๐‘‘ * ) 1. if ๐‘ = 0 1. for ๐‘› รŸ 1 to ๐‘ 2. return 0 2. minNumCoins[ ๐‘› ] รŸ โˆž 3. bestNumCoins รŸ โˆž 3. for ๐‘— รŸ 1 to ๐‘œ 4. for ๐‘— รŸ 1 to ๐‘œ 4. minNumCoins[ ๐‘‘ 8 ] รŸ 1 5. if ๐‘ โ‰ฅ ๐‘‘ 8 5. for ๐‘› รŸ 1 to ๐‘ 6. numCoins รŸ RecursiveChange( ๐‘ โˆ’ ๐‘‘ 8 , ๐‘‘ ' , โ€ฆ , ๐‘‘ * ) 6. for ๐‘— รŸ 1 to ๐‘œ 7. if numCoins + 1 < bestNumCoins 7. if ๐‘› > ๐‘‘ 8 8. bestNumCoins รŸ numCoins + 1 8. minNumCoins[ ๐‘› ] รŸ min( 1 + minNumCoins[ ๐‘› โˆ’ ๐‘‘ 8 ], minNumCoins[ ๐‘› ]) 9. return bestNumCoins 9. return minNumCoins[ M ] 4

  5. Four Different Algorithms Technique Correct? Efficient? Greedy algorithm no yes [GreedyChange] Exhaustive enumeration yes no [ExhaustiveChange] Recursive algorithm yes no [RecursiveChange] Dynamic programming yes yes [DPChange] Question: How to assess efficiency? 5

  6. Running Time Analysis โ€ข The running time of an algorithm ๐ต for problem ฮ  is the maximum number of steps that ๐ต will take on any instance of size ๐‘œ = |๐‘Œ| โ€ข Asymptotic running time ignores constant factors using Big O notation g ( n ) ๐‘”(๐‘œ) is ๐‘ƒ(๐‘• ๐‘œ ) provided there f ( n ) exists ๐‘‘ > 0 and ๐‘œ J โ‰ฅ 0 such that ๐‘” ๐‘œ โ‰ค ๐‘‘ ๐‘•(๐‘œ) for all ๐‘œ โ‰ฅ ๐‘œ J 6

  7. Running Time Analysis โ€“ Example ๐‘”(๐‘œ) is ๐‘ƒ(๐‘• ๐‘œ ) provided there exists ๐‘‘ > 0 and ๐‘œ J โ‰ฅ 0 such that ๐‘” ๐‘œ โ‰ค ๐‘‘ ๐‘•(๐‘œ) for all ๐‘œ โ‰ฅ ๐‘œ J ๐‘” ๐‘œ = 10000 + 500๐‘œ M ๐‘” ๐‘œ ๐‘• ๐‘œ = ๐‘œ N /2 1000 ๐‘• ๐‘œ Pick ๐‘‘ = 1000 and ๐‘œ J = 3 . Then, ๐‘”(๐‘œ) โ‰ค ๐‘‘๐‘•(๐‘œ) for all ๐‘œ โ‰ฅ ๐‘œ J . 7

  8. The Change Problem โ€“ Running Time Analysis GreedyChange( ๐‘, ๐‘‘ ' , โ€ฆ , ๐‘‘ * ) Number of operations: 1. for ๐‘— รŸ 1 to ๐‘œ โ€ข Line 2: 3 = ๐‘ƒ(1) 2. ๐‘’ 8 รŸ b M/c i c โ€ข Line 3: 3 = ๐‘ƒ(1) 3. ๐‘ รŸ ๐‘ โˆ’ ๐‘’ 8 ๐‘‘ 8 โ€ข Total: 6๐‘œ = ๐‘ƒ(๐‘œ) DPChange( ๐‘, ๐‘‘ ' , โ€ฆ , ๐‘‘ * ) 1. for ๐‘› รŸ 1 to ๐‘ 2. minNumCoins[ ๐‘› ] รŸ โˆž Number of operations: 3. for ๐‘— รŸ 1 to ๐‘œ โ€ข Lines 1-2: ๐‘ƒ(๐‘) 4. minNumCoins[ ๐‘‘ 8 ] รŸ 1 โ€ข Lines 3-4: ๐‘ƒ(๐‘œ) 5. for ๐‘› รŸ 1 to ๐‘ 6. for ๐‘— รŸ 1 to ๐‘œ โ€ข Lines 5-8: ๐‘ƒ(๐‘๐‘œ ) 7. if ๐‘› > ๐‘‘ 8 โ€ข Total: ๐‘ƒ(๐‘) + ๐‘ƒ(๐‘œ) + ๐‘ƒ(๐‘๐‘œ) = 8. minNumCoins[ ๐‘› ] รŸ min( 1 + ๐‘ƒ(๐‘๐‘œ) minNumCoins[ ๐‘› โˆ’ ๐‘‘ 8 ], minNumCoins[ ๐‘› ]) 9. return minNumCoins[ M ] 8

  9. Running Time Analysis โ€“ Guidelines โ€ข ๐‘ƒ(๐‘œ Q ) โŠ‚ ๐‘ƒ(๐‘œ S ) for any positive constants ๐‘ < ๐‘ โ€ข For any constants ๐‘, ๐‘ > 0 and ๐‘‘ > 1 , ๐‘ƒ(๐‘) โŠ‚ ๐‘ƒ(log ๐‘œ) โŠ‚ ๐‘ƒ(๐‘œ S ) โŠ‚ ๐‘ƒ(๐‘‘ * ) โ€ข We can multiply to learn about other functions. For any constants ๐‘, ๐‘ > 0 and ๐‘‘ > 1 , ๐‘ƒ ๐‘๐‘œ = ๐‘ƒ(๐‘œ) โŠ‚ ๐‘ƒ(๐‘œ log ๐‘œ) โŠ‚ ๐‘ƒ ๐‘œ ๐‘œ S = ๐‘ƒ(๐‘œ S:' ) โŠ‚ ๐‘ƒ(๐‘œ๐‘‘ * ) โ€ข Base of the logarithm is a constant and can be ignored. For any constants ๐‘, ๐‘ > 1 , ๐‘ƒ log Q ๐‘œ = ๐‘ƒ(log S ๐‘œ/ log S ๐‘) = ๐‘ƒ(1/(log S ๐‘) log S ๐‘œ) = ๐‘ƒ(log S ๐‘œ) 9

  10. Running Time Analysis โ€“ Guidelines Big Oh Name ๐‘ƒ(1) Constant โ€ข ๐‘ƒ(๐‘œ Q ) โŠ‚ ๐‘ƒ(๐‘œ S ) for any positive constants ๐‘ < ๐‘ ๐‘ƒ(log ๐‘œ) Logarithmic ๐‘ƒ(๐‘œ) Linear โ€ข For any constants ๐‘, ๐‘ > 0 and ๐‘‘ > 1 , ๐‘ƒ(๐‘œ Z ) Quadratic ๐‘ƒ ๐‘œ [ = ๐‘ƒ(poly ๐‘œ ) Polynomial ๐‘ƒ(๐‘) โŠ‚ ๐‘ƒ(log ๐‘œ) โŠ‚ ๐‘ƒ(๐‘œ S ) โŠ‚ ๐‘ƒ(๐‘‘ * ) ๐‘ƒ(2 ^_`a(*) ) Exponential โ€ข We can multiply to learn about other functions. For any constants ๐‘, ๐‘ > 0 and ๐‘‘ > 1 , ๐‘ƒ ๐‘๐‘œ = ๐‘ƒ(๐‘œ) โŠ‚ ๐‘ƒ(๐‘œ log ๐‘œ) โŠ‚ ๐‘ƒ ๐‘œ ๐‘œ S = ๐‘ƒ(๐‘œ S:' ) โŠ‚ ๐‘ƒ(๐‘œ๐‘‘ * ) โ€ข Base of the logarithm is a constant and can be ignored. For any constants ๐‘, ๐‘ > 1 , ๐‘ƒ log Q ๐‘œ = ๐‘ƒ(log S ๐‘œ/ log S ๐‘) = ๐‘ƒ(1/(log S ๐‘) log S ๐‘œ) = ๐‘ƒ(log S ๐‘œ) 10

  11. Running Time Analysis โ€“ More Examples * Question : What is ๐‘ƒ ? b 11

  12. Running Time Analysis โ€“ More Examples * Question : What is ๐‘ƒ ? b โ€ข For constant ๐‘™ > 0 it holds that * b = O(๐‘œ b ) * โ€ข Recall that ๐‘œ! = โˆ 8?' Question : What is ๐‘ƒ ๐‘œ! ? ๐‘— 12

  13. Running Time Analysis โ€“ More Examples * Question : What is ๐‘ƒ ? b โ€ข For constant ๐‘™ > 0 it holds that * b = O(๐‘œ b ) * โ€ข Recall that ๐‘œ! = โˆ 8?' Question : What is ๐‘ƒ ๐‘œ! ? ๐‘— * * ij^ * ๐‘œ * = ๐‘ƒ ๐‘œ * = ๐‘ƒ(2 * `_k * ) * Stirlingโ€™s approximation: ๐‘œ! โ‰ˆ 2๐œŒ๐‘œ = 2๐œŒ i (*) (*) : ๐‘œ / exp ๐‘œ < 1 for all ๐‘œ > 0 Question : Is ๐‘œ * = ๐‘ƒ ๐‘œ! ? 13

  14. Running Time Analysis โ€“ More Examples * Question : What is ๐‘ƒ ? b โ€ข For constant ๐‘™ > 0 it holds that * b = O(๐‘œ b ) * โ€ข Recall that ๐‘œ! = โˆ 8?' Question : What is ๐‘ƒ ๐‘œ! ? ๐‘— * * ij^ * ๐‘œ * = ๐‘ƒ ๐‘œ * = ๐‘ƒ(2 * `_k * ) * Stirlingโ€™s approximation: ๐‘œ! โ‰ˆ 2๐œŒ๐‘œ = 2๐œŒ i (*) (*) : ๐‘œ / exp ๐‘œ < 1 for all ๐‘œ > 0 Question : Is ๐‘œ * = ๐‘ƒ ๐‘œ! ? Question : What is ๐‘ƒ log(๐‘œ!) ? 14

  15. Course Topic #1: Sequence Alignment โ€œThus, although the FOXP2 protein is extremely conserved among mammals, it acquired two amino-acid changes on the human lineage, at least one of which may have functional consequences. This is an intriguing finding, because FOXP2 is the first gene known to be involved in the development of speech and language.โ€ Nature (2002) Question : How do we align sequences to identify similarities/differences? 15

  16. Alignment An alignment between two strings v (of m characters) and w (of n characters) is a two row matrix where the first row contains the characters of v in order, the second row contains the characters of w in order, and spaces may be interspersed throughout each. Input Output v : KITTEN v : K - I T T E N - ( m = 6) w : SITTING w : ( n = 7) S I - T T I N G Question: Is this a good alignment? Answer: Count the number of insertion, deletions, substitutions. 16

  17. Alignment An alignment between two strings v (of m characters) and w (of n characters) is a two row matrix where the first row contains the characters of v in order, the second row contains the characters of w in order, and spaces may be interspersed throughout each. Input Output v : KITTEN v : K - I T T E N - ( m = 6) w : SITTING w : ( n = 7) S I - T T I N G Question: Is this a good alignment? Answer: Count the number of insertion, deletions, substitutions. 17

  18. Edit Distance [Levenshtein, 1966] Elementary operations : insertion, deletions and substitutions of single characters Edit Distance problem: Given strings ๐ฐ โˆˆ ฮฃ p and ๐ฑ โˆˆ ฮฃ * , compute the minimum number ๐‘’(๐ฐ, ๐ฑ) of elementary operations to transform ๐ฐ into ๐ฑ . ๐‘’ ๐๐›๐ฎ, ๐๐›๐ฌ = 1 ๐‘’ ๐๐›๐ฎ, ๐›๐ฎ๐Ÿ = 2 ๐‘’ ๐๐›๐ฎ, ๐›๐ฌ๐Ÿ = 3 18

  19. Computing Edit Distance Edit Distance problem: Given strings ๐ฐ โˆˆ ฮฃ p and ๐ฑ โˆˆ ฮฃ * , compute the minimum number ๐‘’(๐ฐ, ๐ฑ) of elementary operations to transform ๐ฐ into ๐ฑ . v : ATGTTAT... deletion insertion mismatch match w : AGCGTAC... ๐‘— โˆ’ 1 ๐‘— ๐ฐ 8 : A T - G T T T prefix of ๐ฐ of length ๐‘— ๐ฑ v : A G C G T - C prefix of ๐ฑ of length ๐‘˜ ๐‘˜ ๐‘˜ โˆ’ 1 Optimal substructure: Edit distance obtained from edit distance of prefix of string. 19

  20. Computing Edit Distance โ€“ Optimal Substructure ๐‘’[๐‘—, ๐‘˜] is the edit distance of ๐ฐ 8 and ๐ฑ v , where ๐ฐ 8 is prefix of ๐ฐ of length ๐‘— and ๐ฑ v is prefix of ๐ฑ of length ๐‘˜ ๐ฐ 8 Deletion: ๐‘’ ๐‘—, ๐‘˜ = ๐‘’ ๐‘— โˆ’ 1, ๐‘˜ + 1 โ€ฆ Extend by a character in ๐ฐ - โ€ฆ - โ€ฆ Insertion: ๐‘’ ๐‘—, ๐‘˜ = ๐‘’ ๐‘—, ๐‘˜ โˆ’ 1 + 1 Extend by a character in ๐ฑ ๐ฑ โ€ฆ v ๐ฐ 8 Mismatch: ๐‘’ ๐‘—, ๐‘˜ = ๐‘’ ๐‘— โˆ’ 1, ๐‘˜ โˆ’ 1 + 1 โ€ฆ Extend by a character in ๐ฐ and ๐ฑ ๐ฑ โ€ฆ v ๐ฐ 8 Match: ๐‘’ ๐‘—, ๐‘˜ = ๐‘’ ๐‘— โˆ’ 1, ๐‘˜ โˆ’ 1 โ€ฆ Extend by a character in ๐ฐ and ๐ฑ ๐ฑ โ€ฆ v 20

Recommend


More recommend