CS3000: Algorithms & Data Jonathan Ullman Lecture 8: Dynamic Programming: RNA Folding, Practice • Feb 3, 2020
RNA Folding
DNA • DNA is a string of four bases {A,C,G,T} • Two complementary strands of DNA stick together and form a double helix • A—T and C—G are complementary pairs
RNA Folding • RNA is a string of four bases {A,C,G,U} • A single RNA strand sticks to itself and folds into complex structures • A—U and C—G are complementary pairs
RNA Folding • RNA strand will try to minimize energy (form the most bonds) subject to constraints
RNA Folding • RNA is a string of bases 𝒄 𝟐 , … , 𝒄 𝒐 ∈ 𝑩, 𝑫, 𝑯, 𝑽 • The structure is given by a set of bonds 𝑇 consisting of pairs 𝑗, 𝑘 with 𝑗 < 𝑘 • (Complements) Only 𝐵 − 𝑉 or 𝐷 − 𝐻 can be paired • (Matching) No base 𝑐 5 is in two pairs in 𝑇 • (No Sharp Turns) If 𝑗, 𝑘 ∈ 𝑇 , then 𝑗 < 𝑘 − 4 • (Non-Crossing) If 𝑗, 𝑘 , 𝑙, ℓ ∈ 𝑇 then it cannot be the case that 𝑗 < 𝑙 < 𝑘 < ℓ
RNA Folding • Input: RNA sequence 𝒄 𝟐 , … , 𝒄 𝒐 ∈ 𝐵, 𝐷, 𝐻, 𝑉 • Output: A set of pairs 𝑇 ⊆ 1, … , 𝑜 × 1, … , 𝑜 • Goal: maximize the size of 𝑇 • (Complements) Only 𝐵 − 𝑉 or 𝐷 − 𝐻 can be paired • (Matching) No base 𝑐 5 is in two pairs in 𝑇 • (No Sharp Turns) If 𝑗, 𝑘 ∈ 𝑇 , then 𝑗 < 𝑘 − 4 • (Non-Crossing) If 𝑗, 𝑘 , 𝑙, ℓ ∈ 𝑇 then it cannot be the case that 𝑗 < 𝑙 < 𝑘 < ℓ
Dynamic Programming • Let 𝑃 be the optimal set of pairs for 𝑐 > ⋯ 𝑐 @ • Case 1: 𝑃 does not include any pair involving 𝑜 • Case 2: 𝑃 has 𝑜 pair with some 𝑢 < 𝑜 − 4 in 𝑃
Dynamic Programming • Let 𝑃 5,B be the optimal set of pairs for 𝑐 5 ⋯ 𝑐 B • Case 1: 𝑃 5,B does not include any pair involving 𝑘 • Case 2: 𝑃 5,B has 𝑘 pair with some 𝑢 < 𝑘 − 4 in 𝑃
Dynamic Programming • Let OPT 𝑗, 𝑘 be the opt. number of pairs for 𝑐 5 ⋯ 𝑐 B • Case 1: 𝑘 pairs with nothing • Case 2: 𝑘 pairs with 𝑢 < 𝑘 − 4
Dynamic Programming • Let OPT 𝑗, 𝑘 be the opt. number of pairs for 𝑐 5 ⋯ 𝑐 B • Case 1: 𝑘 pairs with nothing • Case 2: 𝑘 pairs with 𝑢 < 𝑘 − 4 Recurrence: OPT 𝑗, 𝑘 = max OPT 𝑗, 𝑘 − 1 , max OPT 𝑗, 𝑢 − 1 + OPT 𝑢 + 1, 𝑘 − 1 Maximum over all 𝑢 such that 𝑗 ≤ 𝑢 < 𝑘 − 4 • B are compatible bases • 𝑐 N , 𝑐 Base Cases: OPT 𝑗, 𝑘 = 0 if 𝑗 ≥ 𝑘 − 4
Filling the Table Sequence: 𝐵𝐷𝐷𝐻𝐻𝑉𝐵𝐻𝑉 Recurrence: OPT 𝑗, 𝑘 = max OPT 𝑗, 𝑘 − 1 , OPPQROSPT N OPT 𝑗, 𝑢 − 1 + OPT 𝑢 + 1, 𝑘 − 1 max 6 7 8 j = 9 4 0 0 0 3 0 0 2 0 i = 1
RNA Folding Summary • Compute the optimal RNA folding in time 𝑃 𝑜 V and space 𝑃 𝑜 W • Dynamic Programming: • Decide on an optimal pair 𝑐 N − 𝑐 @ • Remaining RNA is two non-overlapping pieces • Adding variables: one subproblem for each interval • Non-crossing is critical • Think about how the dynamic programming algorithm changes if we remove each of the conditions
Dynamic Programming Practice
Midterm I Review
Midterm I Topics • Fundamentals: • Induction • Asymptotics • Recurrences • Stable Matching • Divide and Conquer • Dynamic Programming
Topics: Induction • Proof by Induction: 5Y> = @ @Z> @ • Mathematical formulas, e.g. ∑ 𝑗 W • Spot the bug • Solutions to recurrences • Correctness of divide-and-conquer algorithms • Good way to study: • Lehman-Leighton-Meyer, Mathematics for CS • Review divide-and-conquer in Kleinberg-Tardos
Practice Question: Induction • Suppose you have an unlimited supply of 3 and 7 cent coins, prove by induction that you can make any amount 𝑜 ≥ 12 .
Topics: Asymptotics • Asymptotic Notation • 𝑝, 𝑃, 𝜕, Ω, Θ • Relationships between common function types • Good way to study: • Kleinberg-Tardos Chapter 2
Topics: Asymptotics Notation … means … Think… E.g. 100n 2 = O(n 3 ) f(n)=O(n) ∃𝑑 > 0, 𝑜 c > 0, ∀𝑜 ≥ 𝑜 c : At most 0 ≤ 𝑔 𝑜 ≤ 𝑑(𝑜) “≤” 2 n = W (n 100 ) f(n)= W (g(n)) ∃𝑑 > 0, 𝑜 c > 0, ∀𝑜 ≥ 𝑜 c : At least 0 ≤ 𝑑 𝑜 ≤ 𝑔(𝑜) “≥” f(n)= Q (g(n)) log(n!) = Q (n log n) Equals 𝑔 𝑜 = 𝑃 𝑜 and 𝑔 𝑜 = 𝛻( 𝑜 ) “=” n 2 = o(2 n ) f(n)=o(g(n)) ∀𝑑 > 0, ∃𝑜 c > 0, ∀𝑜 ≥ 𝑜 c : Less than 0 ≤ 𝑔 𝑜 < 𝑑(𝑜) “<” n 2 = w (log n) f(n)= w (g(n)) ∀𝑑 > 0, ∃𝑜 c > 0, ∀𝑜 ≥ 𝑜 c : Greater than 0 ≤ 𝑑 𝑜 < 𝑔(𝑜) “>”
Topics: Asymptotics • Constant factors can be ignored • ∀𝐷 > 0 𝐷𝑜 = 𝑃 𝑜 • Smaller exponents are Big-Oh of larger exponents • ∀𝑏 > 𝑐 𝑜 l = 𝑃 𝑜 m • Any logarithm is Big-Oh of any polynomial m 𝑜 = 𝑃 𝑜 r • ∀𝑏, 𝜁 > 0 log W • Any polynomial is Big-Oh of any exponential • ∀ 𝑏 > 0, 𝑐 > 1 𝑜 m = 𝑃 𝑐 @ • Lower order terms can be dropped • 𝑜 W + 𝑜 V/W + 𝑜 = 𝑃 𝑜 W
Practice Question: Asymptotics • Put these functions in order so that 𝑔 5 = 𝑃 𝑔 5Z> • 𝑜 PQt u v • 8 PQt u @ • 2 V PQt u PQt u @ • 2 PQt u @ u @ • ∑ 𝑗 5Y> • 𝑜 W log W 𝑜
Practice Question: Asymptotics • Suppose 𝑔 > = 𝑃 and 𝑔 W = 𝑃 . Prove that 𝑔 > + 𝑔 W = 𝑃 .
Topics: Recurrences • Recurrences • Representing running time by a recurrence • Solving common recurrences • Master Theorem • Good way to study: • Erickson book • Kleinberg-Tardos divide-and-conquer chapter
Practice Question: Recurrences F(n): For i = 1,…,n 2 : Print “Hi” For i = 1,…,3: F(n/3) • Write a recurrence for the running time of this algorithm. Write the asymptotic running time given by the recurrence.
� � Topics: Recurrences • Consder the recurrence 𝑈 𝑜 = 𝑜 ⋅ 𝑈 𝑜 + 𝑜 with 𝑈 1 = 1 . Solve using a recursion tree.
Topics: Divide-and-Conquer • Divide-and-Conquer • Writing pseudocode • Proving correctness by induction • Analyzing running time via recurrences • Examples we’ve studied: • Mergesort, Binary Search, Karatsuba’s, Selection • Good way to study: • Example problems from Kleinberg-Tardos or Erickson • Practice, practice, practice!
Topics: Dynamic Programming • Dynamic Programming • Identify sub-problems • Write a recurrence, 𝑃𝑄𝑈 𝑜 = max 𝑤 @ + 𝑃𝑄𝑈 𝑜 − 6 , 𝑃𝑄𝑈(𝑜 − 1) • Fill the dynamic programming table • Find the optimal solution • Analyze running time • Good way to study: • Example problems from Kleinberg-Tardos or Erickson • Practice, practice, practice!
� Practice Question • Design an 𝑃(𝑜) -time algorithm that takes an array 𝐵[1: 𝑜] and returns a sorted array containing the smallest 𝑜 elements of 𝐵
Practice Question • Consider the following sorting algorithm A[1:n] is a global array SillySort(1,n): if (n <= 2): put A in order else: SillySort(1,2n/3) SillySort(n/3,n) SillySort(1,2n/3) • Prove that it is correct • Analyze its running time
Dynamic Programming Practice
Chocolate Bar Splitting • Input: A chocolate bar with 𝑜 × 𝑛 pieces • Output: The minimum number of cuts needed to divide the block into perfect squares
Chocolate Bar Splitting
Vankin’s Mile • Input: An 𝑜 × 𝑜 board of numbers • Rules: • Place a chip on the board • Keep moving the tile down or right until you fall off • Score = sum of the numbers your chip visited • Output: The best possible strategy
Recommend
More recommend