cs3000 algorithms data
play

CS3000: Algorithms & Data Jonathan Ullman Midterm Info Lecture - PowerPoint PPT Presentation

CS3000: Algorithms & Data Jonathan Ullman Midterm Info Lecture 8: Dynamic Programming: RNA Folding, Practice Feb 3, 2020 Examples Dynamic Programming Choose a subset Interval One variable recurrence Scheduling Partition the line


  1. CS3000: Algorithms & Data Jonathan Ullman Midterm Info Lecture 8: Dynamic Programming: RNA Folding, Practice • Feb 3, 2020

  2. Examples Dynamic Programming Choose a subset Interval One variable recurrence Scheduling Partition the line into intervals Segmented Least Squares Choose a subset Knapsack Two arable recurrence Choose the last piece of Edit Distance Alignments theatrgnnert Choose a subset Concert Scheduling one variable recurrence Parr up items RNA Folding Two variable recurrence

  3. RNA Folding

  4. DNA • DNA is a string of four bases {A,C,G,T} • Two complementary strands of DNA stick together and form a double helix • A—T and C—G are complementary pairs

  5. RNA Folding • RNA is a string of four bases {A,C,G,U} • A single RNA strand sticks to itself and folds into complex structures • A—U and C—G are complementary pairs

  6. RNA Folding 0 O O O O O 5 2 6 I no crossing • RNA strand will try to minimize energy (form the most bonds) subject to constraints 00 O o o 0 he pours too close together

  7. RNA Folding • RNA is a string of bases ! " , … , ! % ∈ ', (, ), * • The structure is given by a set of bonds + consisting of pairs ,, - with , < - • (Complements) Only / − 1 or 2 − 3 can be paired • (Matching) No base 4 5 is in two pairs in + • (No Sharp Turns) If ,, - ∈ + , then , < - − 4 • (Non-Crossing) If ,, - , 7, ℓ ∈ + then it cannot be the case that , < 7 < - < ℓ in of

  8. RNA Folding • Input: RNA sequence ! " , … , ! % ∈ /, 2, 3, 1 • Output: A set of pairs + ⊆ 1, … , ; × 1, … , ; • Goal: maximize the size of + • (Complements) Only / − 1 or 2 − 3 can be paired • (Matching) No base 4 5 is in two pairs in + • (No Sharp Turns) If ,, - ∈ + , then , < - − 4 • (Non-Crossing) If ,, - , 7, ℓ ∈ + then it cannot be the case that , < 7 < - < ℓ

  9. Dynamic Programming • Let = be the optimal set of pairs for 4 > ⋯ 4 @ • Case 1: = does not include any pair involving ; O is the optimal solution using by ibn i • Case 2: = has ; pair with some A < ; − 4 in = It the optimal soliton using bi be 0 t n is t the optimal solution using beet bn I 3 optimal here AM A n f n optimal here

  10. Dynamic Programming • Let = 5,B be the optimal set of pairs for 4 5 ⋯ 4 B • Case 1: = 5,B does not include any pair involving - • Case 2: = 5,B has - pair with some A < - − 4 in =

  11. Dynamic Programming Kien itycj.cn L bonds subproblem • Let OPT ,, - be the opt. number of pairs for 4 5 ⋯ 4 mom B • Case 1: - pairs with nothing i j i j OPT OPT 1 • Case 2: - pairs with A < - − 4 i j t 11 j l It OPT 0PT i OPT t 1

  12. Max EA B Dynamic Programming At hyrax • Let OPT ,, - be the opt. number of pairs for 4 5 ⋯ 4 B • Case 1: - pairs with nothing • Case 2: - pairs with A < - − 4 Recurrence: OPT ,, - felt = max OPT ,, - − 1 , max OPT ,, A − 1 + OPT A + 1, - − 1 Maximum over all A such that , ≤ A < - − 4 • 4 N , 4 B are compatible bases • Base Cases: OPT ,, - = 0 if , ≥ - − 4

  13. it ACCGGUA 9 ACCGGU Filling the Table 2 8 CCGGUAG CCGGUA 2 Sequence: /22331/31 3 9 CGGUAG CGGUAGU 2 8 GGUAGU 8 Recurrence: ACCGGUAG 4 7 3 7 OPT ,, - = max OPT ,, - − 1 , OPPQROSPT N OPT ,, A − 1 + OPT A + 1, - − 1 max CCGGUAGU 6 7 8 j = 9 2 9 4 0 0 0 O 3 0 0 L L 2 0 I I i = 1 2 1 I

  14. RNA Folding Summary • Compute the optimal RNA folding in time = ; V and space = ; W • Dynamic Programming: • Decide on an optimal pair 4 N − 4 @ • Remaining RNA is two non-overlapping pieces • Adding variables: one subproblem for each interval • Non-crossing is critical • Think about how the dynamic programming algorithm changes if we remove each of the conditions

  15. Midterm I Review

  16. Last year's midterm will be online Midterm I Topics Cheatheets One 8 11 page • Fundamentals: Double sided • Induction • Asymptotics or handunteer Typed • Recurrences • Stable Matching mammoths use the • Divide and Conquer Hu tempore Hpt fort • Dynamic Programming or

  17. Topics: Induction • Proof by Induction: 5Y> = @ @Z> @ • Mathematical formulas, e.g. ∑ , W • Spot the bug • Solutions to recurrences • Correctness of divide-and-conquer algorithms • Good way to study: Link to the on the website • Lehman-Leighton-Meyer, Mathematics for CS • Review divide-and-conquer in Kleinberg-Tardos

  18. Practice Question: Induction • Suppose you have an unlimited supply of 3 and 7 cent coins, prove by induction that you can make any amount ; ≥ 12 .

  19. Topics: Asymptotics • Asymptotic Notation • \, =, ], Ω, Θ • Relationships between common function types • Good way to study: • Kleinberg-Tardos Chapter 2 Also linked online Book Jeff Erickson

  20. Topics: Asymptotics Notation … means … Think… E.g. 100n 2 = O(n 3 ) f(n)=O(n) ∃a > 0, ; c > 0, ∀; ≥ ; c : At most 0 ≤ f ; ≤ ag(;) “≤” 2 n = W (n 100 ) f(n)= W (g(n)) ∃a > 0, ; c > 0, ∀; ≥ ; c : At least 0 ≤ ag ; ≤ f(;) “≥” f(n)= Q (g(n)) log(n!) = Q (n log n) Equals f ; = = g ; and f ; = j(g ; ) “=” n 2 = o(2 n ) f(n)=o(g(n)) ∀a > 0, ∃; c > 0, ∀; ≥ ; c : Less than 0 ≤ f ; < ag(;) “<” n 2 = w (log n) f(n)= w (g(n)) ∀a > 0, ∃; c > 0, ∀; ≥ ; c : Greater than 0 ≤ ag ; < f(;) “>”

  21. Topics: Asymptotics • Constant factors can be ignored • ∀2 > 0 2; = = ; • Smaller exponents are Big-Oh of larger exponents • ∀k > 4 ; l = = ; m • Any logarithm is Big-Oh of any polynomial m ; = = ; r • ∀k, n > 0 log W • Any polynomial is Big-Oh of any exponential • ∀ k > 0, 4 > 1 ; m = = 4 @ • Lower order terms can be dropped • ; W + ; V/W + ; = = ; W

  22. Practice Question: Asymptotics • Put these functions in order so that f 5 = = f 5Z> • ; PQt u v 2.882 2.804 n n • 8 PQt u @ Z 3 • 2 V PQt u PQt u @ n n • 2 PQt u @ u @ • ∑ , 5Y> • ; W log W ;

  23. Practice Question: Asymptotics • Suppose f > = = g and f W = = g . Prove that f > + f W = = g .

  24. Topics: Recurrences t TLE 1 Fo 1 in n • Recurrences • Representing running time by a recurrence • Solving common recurrences • Master Theorem Drawing the recursion tree • Good way to study: • Erickson book • Kleinberg-Tardos divide-and-conquer chapter TIE T E 1 in n g TIE tn

  25. Practice Question: Recurrences F(n): n2t3T 1 In For i = 1,…,n 2 : Print “Hi” For i = 1,…,3: F(n/3) • Write a recurrence for the running time of this algorithm. Write the asymptotic running time given by the recurrence.

  26. � � Topics: Recurrences TIM n loglogn 0 • Consder the recurrence x ; = ; ⋅ x ; + ; with x 1 = 1 . Solve using a recursion tree. log 127 T 27 2 T 2

  27. Topics: Divide-and-Conquer • Divide-and-Conquer • Writing pseudocode • Proving correctness by induction • Analyzing running time via recurrences • Examples we’ve studied: • Mergesort, Binary Search, Karatsuba’s, Selection Good discussion of • Good way to study: pseudocode 0 • Example problems from Kleinberg-Tardos or Erickson • Practice, practice, practice!

  28. Topics: Dynamic Programming • Dynamic Programming • Identify sub-problems • Write a recurrence, ={x ; = max | @ + ={x ; − 6 , ={x(; − 1) • Fill the dynamic programming table • Find the optimal solution • Analyze running time • Good way to study: • Example problems from Kleinberg-Tardos or Erickson • Practice, practice, practice!

  29. � Practice Question • Design an =(;) -time algorithm that takes an array /[1: ;] and returns a sorted array containing the smallest ; elements of /

  30. Practice Question • Consider the following sorting algorithm A[1:n] is a global array SillySort(1,n): if (n <= 2): put A in order else: SillySort(1,2n/3) SillySort(n/3,n) SillySort(1,2n/3) • Prove that it is correct • Analyze its running time

  31. Dynamic Programming Practice

  32. Chocolate Bar Splitting • Input: A chocolate bar with ; × € pieces • Output: The minimum number of cuts needed to divide the block into perfect squares ki

  33. Chocolate Bar Splitting

  34. Vankin’s Mile O O OO • Input: An ; × ; board of numbers O 6 • Rules: 10 score • Place a chip on the board • Keep moving the tile down or right until you fall off • Score = sum of the numbers your chip visited • Output: The best possible strategy

Recommend


More recommend