CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014 1
Course Outline Introduction and basic concepts • Asymptotic notation • Greedy algorithms • Graph theory • Amortized analysis • Recursion • Divide-and-conquer algorithms • Randomized algorithms • Dynamic programming algorithms • NP-completeness • 2
Dynamic Programming 3
Dynamic Programming Components Analyse the structure of an optimal solution • Separate one choice (usually the last) from a subproblem • Phrase the value of a choice as a function of the choice and the subproblem • Phrase an optimal solution as the value of the best choice • Usually a max/min result • Implement the calculation of the optimal value • Memoization: save optimal values as we compute them • Bottom-up: evaluate smaller problems and use them for bigger problems • Top-down: evaluate big problem by calling smaller problems recursively and • saving result Keep record of the choice made in each level • Rebuild the optimal solution from the optimal value result • 4
Knapsack Problem Algorithm Knapsack( 𝑥 , 𝑞 , 𝑁 ) – 𝑥 is array of weights, 𝑞 is array of values, 𝑁 is limit 𝑠 0, 𝑛 ← 0 , 𝑚 0, 𝑛 ← false for 𝑛 = 0,1,2, … , 𝑁 For 𝑗 ← 1 To 𝑥 Do For 𝑛 ← 1 To 𝑁 Do If 𝑥 𝑗 > 𝑛 Or 𝑠 𝑗 − 1, 𝑛 > 𝑠 𝑗 − 1, 𝑛 − 𝑥 𝑗 + 𝑞[𝑗] Then 𝑠 𝑗, 𝑛 ← 𝑠 𝑗 − 1, 𝑛 , 𝑚 𝑗, 𝑛 ← 𝑚[𝑗 − 1, 𝑛] Else + 𝑞[𝑗] , 𝑚 𝑗, 𝑛 ← 𝑗 𝑠 𝑗, 𝑛 ← 𝑠 𝑗 − 1, 𝑛 − 𝑥 𝑗 𝑡 ← ∅ , 𝑦 ← 𝑥 While 𝑁 > 0 And 𝑚[𝑦, 𝑁] is not false Do 𝑦 ← 𝑚[𝑦, 𝑁] , 𝑡 ← 𝑡 ∪ 𝑦 , 𝑁 ← 𝑁 − 𝑥 𝑦 , 𝑦 ← 𝑦 − 1 Return 𝑡 5
Knapsack Algorithm - Complexity What is the time complexity of the knapsack algorithm? • 𝑃(𝑜𝑋) (number of items times the weight limit) • This algorithm is called pseudo-polynomial • Time complexity is based on the value of the input, not just the size • There is no known polynomial algorithm to solve the knapsack problem • 6
Algorithm Strategies - Review Dynamic programming algorithms: • Choice is made based on evaluation of all possible results • Time and space complexity are usually higher • Greedy algorithms: • Choice is made based on locally optimal solution • Usually faster, but may not result in globally optimal solution • Divide and conquer algorithms: • Choice of input division is made based on assumption that merging result of • subproblems is optimal 7
Global Sequence Alignment Problem Problem: given two sequences, analyse how similar they are • Allow both gaps and mismatches • Application: • Finding suggestions for misspelled words (comparing strings) • Comparing files (diff) • Analyse if two pieces of DNA match • Example: “ ocurrance ” vs “occurrence” • There is a letter “c” missing (gap) • An “a” was used instead of an “e” (mismatch) • Mismatches may be seen as gaps in both sides • “ oc-urra-nce ” vs “ occurr-ence ” • 8
Formal Definition We represent a gap with a hyphen “ − ” • A sequence alignment of (𝑌, 𝑍) is a pair (𝑌 ′ , 𝑍 ′ ) of sequences, such that: • 𝑌 ′ minus the gaps is 𝑌 , 𝑍 ′ minus the gaps is 𝑍 • 𝑌 ′ = 𝑍 ′ (the size is the same for both sides) • ′ = − , then 𝑍 ′ ≠ − (you can’t have gaps on both sides) If 𝑌 𝑗 • 𝑗 A parameter 𝜀 > 0 defines the gap penalty (penalty if one side has a gap) • A parameter 𝛽 𝑞𝑟 defines the mismatch penalty of matching 𝑞 and 𝑟 ( 𝛽 𝑞𝑞 = 0 ) • 𝑌 ′ The cost of a matching (𝑌 ′ , 𝑍 ′ ) is 𝑗=0 𝑞𝑓𝑜(𝑦 𝑗 , 𝑧 𝑗 ) • 9
Finding the Best Alignment What is the choice to be made? • Last character could be a gap on either side, or a potential mismatch • Assume 𝐺(𝑗, 𝑘) is the penalty for the best alignment of 𝑦 1 . . 𝑦 𝑗 and 𝑧 1 . . 𝑧 𝑘 • 𝑘 ⋅ 𝜀 𝑗 = 0 𝑗 ⋅ 𝜀 𝑘 = 0 𝐺 𝑗, 𝑘 = min 𝐺 𝑗 − 1, 𝑘 − 1 + 𝛽 𝑦 𝑗 𝑧 𝑘 , 𝐺 𝑗 − 1, 𝑘 + 𝜀, 𝐺 𝑗, 𝑘 − 1 + 𝜀 otherwise 10
Algorithm (Smith-Wasserman) Algorithm SmithWasserman( 𝑌 , 𝑍 , 𝜀 , 𝛽 ) For 𝑗 ← 0 To |𝑌| Do 𝐺 𝑗, 0 ← 𝑗 ⋅ 𝜀 For 𝑘 ← 1 To |𝑍| Do 𝐺 0, 𝑘 ← 𝑘 ⋅ 𝜀 For 𝑗 ← 1 To |𝑌| Do -- matching cost 𝑛 ← 𝐺 𝑗 − 1, 𝑘 − 1 + 𝛽 𝑌 𝑗 , 𝑍 𝑘 𝑦 ← 𝐺 𝑗, 𝑘 − 1 + 𝜀 , 𝑧 ← 𝐺 𝑗 − 1, 𝑘 + 𝜀 -- gap penalty in 𝑌, 𝑍 If 𝑛 ≤ 𝑦 And 𝑛 ≤ 𝑧 Then 𝐺 𝑗, 𝑘 ← 𝑛 , 𝐼 𝑗, 𝑘 ← ”match” Else If 𝑦 ≤ 𝑧 Then 𝐺 𝑗, 𝑘 ← 𝑦 , 𝐼 𝑗, 𝑘 ← ”gap in X” Else 𝐺 𝑗, 𝑘 ← 𝑧 , 𝐼 𝑗, 𝑘 ← ”gap in Y” 11
Algorithm (cont.) … 𝑌 ′ ← “”, 𝑍 ′ ← “” 𝑗 ← 𝑛 , 𝑘 ← 𝑜 While 𝑗 > 0 Or 𝑘 > 0 Do If 𝐼 𝑗, 𝑘 = “match” Then 𝑌 ′ ← 𝑌 𝑗 . X′ , 𝑍 ′ ← 𝑍 𝑘 . Y′ 𝑗 ← 𝑗 − 1 , 𝑘 ← 𝑘 − 1 Else If 𝐼 𝑗, 𝑘 = “gap in X” Then 𝑌 ′ ← − . X′ , 𝑍 ′ ← 𝑍 𝑘 . Y′ 𝑘 ← 𝑘 − 1 Else 𝑌 ′ ← 𝑌 𝑗 . X′ , 𝑍 ′ ← − . Y′ 𝑗 ← 𝑗 − 1 Return 𝑌 ′ , 𝑍 ′ , 𝐺[𝑛, 𝑜] 12
Longest Common Subsequence Subsequence: any sequence of items that is contained in the original sequence in • the same order (but not necessarily consecutively) Example: 𝐶, 𝐷, 𝐸, 𝐶 is a subsequence of 𝐵, 𝑪, 𝑫, 𝐶, 𝑬, 𝐵, 𝑪 • Problem: Given two sequences 𝑌 and 𝑍 , find the longest common subsequence of • 𝑌 and 𝑍 Application: • Find common DNA sequences in different organisms • Video compression (inter-frame comparison) • 13
Characterizing the LCS Define 𝑌 𝑗 as the sequence 𝑌 limited to the first 𝑗 elements • Given two sequences 𝑌 = 𝑦 1 , . . , 𝑦 𝑛 and 𝑍 = 𝑧 1 , . . , 𝑧 𝑜 , let 𝑎 = 𝑨 1 , . . , 𝑨 𝑙 be the longest • common subsequence (LCS) of 𝑌 and 𝑍 If 𝑦 𝑛 = 𝑧 𝑜 , then 𝑨 𝑙 = 𝑦 𝑛 = 𝑧 𝑜 , and 𝑎 𝑙−1 is an LCS of 𝑌 𝑛−1 and 𝑍 • 𝑜−1 If 𝑦 𝑛 ≠ 𝑧 𝑜 , then 𝑎 is either an LCS of 𝑌 𝑛 and 𝑍 𝑜−1 , or an LCS of 𝑌 𝑛−1 and 𝑍 • 𝑜 Define the length of the LCS of 𝑌 𝑗 and 𝑍 𝑘 as: • 0 𝑗 = 0 ∨ 𝑘 = 0 𝑑 𝑗 − 1, 𝑘 − 1 + 1 𝑗, 𝑘 > 0 ∧ 𝑦 𝑗 = 𝑧 𝑘 𝑑 𝑗, 𝑘 = max{𝑑 𝑗, 𝑘 − 1 , 𝑑 𝑗 − 1, 𝑘 } otherwise 14
Algorithm Algorithm LongestCommonSubsequence( 𝑌 , 𝑍 ) For 𝑗 ← 0 To 𝑌 Do c[𝑗, 0] ← 0 For 𝑘 ← 1 To 𝑍 Do c[0, 𝑘] ← 0 For 𝑗 ← 1 To |𝑌| Do If 𝑌 𝑗 = 𝑍[𝑘] Then 𝑑 𝑗, 𝑘 ← 𝑑 𝑗 − 1, 𝑘 − 1 + 1 , ℎ 𝑗, 𝑘 ← “+” Else If 𝑑 𝑗 − 1, 𝑘 > 𝑑[𝑗, 𝑘 − 1] Then 𝑑 𝑗, 𝑘 ← 𝑑[𝑗 − 1, 𝑘] , ℎ 𝑗, 𝑘 ← “X” Else 𝑑 𝑗, 𝑘 ← 𝑑[𝑗, 𝑘 − 1] , ℎ 𝑗, 𝑘 ← “Y” PrintLCS( 𝑌 , ℎ , |𝑌| , |𝑍| ) Return 𝑑 𝑌 , 𝑍 15
Algorithm (cont.) Algorithm PrintLCS( ℎ , 𝑌 , 𝑗 , 𝑘 ) If 𝑗 = 0 Or 𝑘 = 0 Then Return If ℎ 𝑗, 𝑘 = “+” Then PrintLCS( ℎ , 𝑌 , 𝑗 − 1 , 𝑘 − 1 ) Print 𝑌[𝑗] Else If ℎ 𝑗, 𝑘 = “X” Then PrintLCS( ℎ , 𝑌 , 𝑗 − 1 , 𝑘 ) Else PrintLCS( ℎ , 𝑌 , 𝑗 , 𝑘 − 1 ) 16
NP Complexity 17
Time Complexity for Decision Problems From this point on we analyse time complexity for problems, not algorithms • We want to know what is the best possible complexity for the problem • Our focus now is on decision problems, not optimization problems • Decision problems: Yes/No answer • Optimization: “find best”, “find maximum”, “find minimum” • We also need to distinguish “finding” and “checking” a solution • 18
Time Complexity - Classes A problem is solvable in polynomial time if there is an algorithm that solves it, that • runs in 𝑃 𝑜 𝑙 , where 𝑙 ∈ Θ 1 and 𝑜 is the size of the input representation Example: sort ( 𝑃 𝑜 log 𝑜 ⊂ 𝑃 𝑜 2 ), select ( 𝑃(𝑜) ), longest common subsequence • ( 𝑃(𝑜 2 ) ), matrix multiplication ( 𝑃(𝑜 3 ) or better) P: set of all decision problems that are solvable in polynomial time • NP (non-deterministic P): set of all decision problems for which a given certificate • can be checked in polynomial time 19
Example: Hamiltonian Path Problem: given a graph, is there a path that goes through every node exactly • once? Decision problem: answer is yes or no • Optimization problem: find a path with minimum cost, etc.; not required • Is this problem in NP? • Given a path, can we verify that the path is correct in polynomial time? • Is this problem in P? • Can we solve it in polynomial time? • 20
Recommend
More recommend