CS4102 Algorithms Fall 2018 Warm up Decode the line below into English (hint: use Google or Wolfram Alpha) ·· ·-·· ·· -·- · ·- ·-·· --· --- ·-· ·· - ···· -- ··· 1
CS4102 Algorithms Fall 2018 Warm up Decode the line below into English (hint: use Google or Wolfram Alpha) ·· ·-·· ·· -·- · ·- ·-·· --· --- ·-· ·· - ···· -- ··· 2
Interval Scheduling Run Time Find event ending earliest, add to solution, Remove it and all conflicting events, Repeat until all events removed, return solution Equivalent way StartTime = 0 For each interval (in order of finish time): 𝑃(𝑜) if end of interval < Start Time: 𝑃(1) do nothing else: add interval to solution 𝑃(1) StartTime = end of interval 3
Interval Scheduling Algorithm Find event ending earliest, add to solution, Remove it and all conflicting events, Repeat until all events removed, return solution 4
Today’s Keywords • Greedy Algorithms • Choice Function • Prefix-free code • Compression • Huffman Code 5
CLRS Readings • Chapter 16 6
Homeworks • HW6 Due Friday Nov 9 @11pm – Written (use latex) – DP and Greedy 7
Sam Morse • Engineer and artist 8
Message Encoding • Problem: need to electronically send a message to two people at a distance. • Channel for message is binary (either on or off) 𝑛 9
Character Frequency Encoding How can we do it? 0000 a: 2 0001 d: 2 wiggle, wiggle, wiggle like a gypsy queen 0010 e: 13 wiggle, wiggle, wiggle all dressed in green 0011 g: 14 • Take the message, send it over 0100 i: 8 character-by-character with an k: 1 0101 0110 l: 9 encoding 0111 n: 3 1000 p: 1 1001 q: 1 1010 r: 2 1011 s: 3 u: 1 1100 1101 w: 6 1110 y: 2 10
Encoding Character Table Frequency How efficient is this? 𝑈 𝑔 𝑑 a: 2 0001 wiggle wiggle wiggle like a gypsy queen d: 2 0010 wiggle wiggle wiggle all dressed in green e: 13 0011 Each character requires 4 bits g: 14 0100 ℓ 𝑑 = 4 i: 8 0101 k: 1 0110 l: 9 0111 Cost of encoding: n: 3 1000 p: 1 1001 q: 1 1010 r: 2 1011 s: 3 1100 u: 1 1101 Better Solution: Allow for different w: 6 1110 characters to have different-size encodings y: 2 1111 (high frequency → short code) 11
More efficient coding When this is big Character Frequency Make this small Codeword Size 12
Morse Code Character Frequency Codeword Size 13
Problem with Morse Code A A Decode: ET ET R T EN T Ambiguous Decoding 14
Prefix-Free Code • A prefix-free code is codeword table 𝑈 such that for any two characters 𝑑 1 , 𝑑 2 , if 𝑑 1 ≠ 𝑑 2 then 𝑑𝑝𝑒𝑓(𝑑 1 ) is not a prefix of 𝑑𝑝𝑒𝑓(𝑑 2 ) g 0 1111011100011010 e 10 w i gg l e l 110 i 1110 w 11110 … … 15
Binary Trees = Prefix-free Codes • I can represent any prefix-free code as a binary tree • I can create a prefix-free code from any binary tree g 0 1 0 e 10 g 1 0 g 00 l 110 e 1 0 e 01 i 1110 1 0 l 1 l 10 w 11110 0 1 i 110 … … i 0 1 0 0 w 111 1 w 0 … … g e l w i 16
Goal: Shortest Prefix-Free Encoding • Input: A set of character frequencies {𝑔 𝑑 } • Output: A prefix-free code 𝑈 which minimizes Huffman Coding!! 17
Greedy Algorithms • Require Optimal Substructure – Solution to larger problem contains the solution to a smaller one – Only one subproblem to consider! • Idea: 1. Identify a greedy choice property • How to make a choice guaranteed to be included in some optimal solution 2. Repeatedly apply the choice property until no subproblems remain 18
Huffman Algorithm • Choose the least frequent pair, combine into a subtree G:14 E:13 L:9 I:8 W:6 N:3 S:3 A:2 D:2 R:2 Y:2 K:1 P:1 Q:1 U:1 19
Huffman Algorithm • Choose the least frequent pair, combine into a subtree G:14 E:13 L:9 I:8 W:6 N:3 S:3 A:2 D:2 R:2 Y:2 2 K:1 P:1 1 0 Q:1 U:1 Subproblem of size 𝑜 − 1 ! 20
Huffman Algorithm • Choose the least frequent pair, combine into a subtree G:14 E:13 L:9 I:8 W:6 N:3 S:3 A:2 D:2 R:2 Y:2 2 2 1 1 0 0 Q:1 U:1 K:1 P:1 21
Huffman Algorithm • Choose the least frequent pair, combine into a subtree G:14 E:13 L:9 I:8 W:6 4 N:3 S:3 A:2 D:2 R:2 Y:2 1 0 2 2 1 1 0 0 Q:1 U:1 K:1 P:1 22
Huffman Algorithm • Choose the least frequent pair, combine into a subtree G:14 E:13 L:9 I:8 W:6 4 4 N:3 S:3 A:2 D:2 1 1 0 0 R:2 Y:2 2 2 1 1 0 0 Q:1 U:1 K:1 P:1 23
Huffman Algorithm • Choose the least frequent pair, combine into a subtree G:14 E:13 L:9 I:8 W:6 4 4 4 N:3 S:3 1 1 1 0 0 0 R:2 Y:2 A:2 D:2 2 2 1 1 0 0 Q:1 U:1 K:1 P:1 24
Huffman Algorithm • Choose the least frequent pair, combine into a subtree G:14 E:13 L:9 I:8 W:6 6 4 4 4 0 1 1 1 1 0 0 0 N:3 S:3 R:2 Y:2 A:2 D:2 2 2 1 1 0 0 Q:1 U:1 K:1 P:1 25
Huffman Algorithm • Choose the least frequent pair, 68 combine into a subtree 0 1 41 27 0 1 0 1 17 G:14 E:13 24 0 1 1 0 14 10 L:9 I:8 0 1 0 1 8 W:6 6 4 1 1 1 0 0 0 N:3 S:3 4 4 2 2 1 1 1 1 0 0 0 0 26 Y:2 D:2 U:1 P:1 R:2 A:2 Q:1 K:1
Exchange argument • Shows correctness of a greedy algorithm • Idea: – Show exchanging an item from an arbitrary optimal solution with your greedy choice makes the new solution no worse – How to show my sandwich is at least as good as yours: • Show: “I can remove any item from your sandwich, and it would be no worse by replacing it with the same item from my sandwich” 27
Showing Huffman is Optimal • Overview: – Show that there is an optimal tree in which the least frequent characters are siblings • Exchange argument – Show that making them siblings and solving the new smaller sub-problem results in an optimal solution • Proof by contradiction 28
Showing Huffman is Optimal • First Step: Show any optimal tree is “full” (each node has either 0 or 2 children) 0 1 0 1 W W 1 0 0 R Y 1 0 𝑈′ is a “better” tree than 𝑈 , because all codes in R Y red subtree are shorter in 𝑈′ , without creating any longer codes 29
Huffman Exchange Argument • Claim: if 𝑑 1 , 𝑑 2 are the least-frequent characters, then there is an optimal prefix-free code s.t. 𝑑 1 , 𝑑 2 are siblings – i.e. codes for 𝑑 1 , 𝑑 2 are the same length and differ only by their last bit Case 1: Consider some optimal tree 𝑈 𝑝𝑞𝑢 . If 𝑑 1 , 𝑑 2 are siblings in this tree, then claim holds 𝑈 𝑝𝑞𝑢 𝑑 1 30 𝑑 2
Huffman Exchange Argument • Claim: if 𝑑 1 , 𝑑 2 are the least-frequent characters, then there is an optimal prefix-free code s.t. 𝑑 1 , 𝑑 2 are siblings – i.e. codes for 𝑑 1 , 𝑑 2 are the same length and differ only by their last bit Case 2: Consider some optimal tree 𝑈 𝑝𝑞𝑢 , in which 𝑑 1 , 𝑑 2 are not siblings Let 𝑏, 𝑐 be the two characters of lowest 𝑈 depth that are siblings 𝑝𝑞𝑢 (Why must they exist?) 𝑑 2 Idea: show that swapping 𝑑 1 with 𝑏 does not increase cost of the tree. 𝑑 1 Similar for 𝑑 2 and 𝑐 Assume: 𝑔 𝑑1 ≤ 𝑔 𝑏 and 𝑔 𝑑2 ≤ 𝑔 𝑐 𝑏 31 𝑐
Case 2: are not siblings in • Claim: the least-frequent characters ( 𝑑 1 , 𝑑 2 ), are siblings in some optimal tree 𝑏, 𝑐 = lowest-depth siblings Idea: show that swapping 𝑑 1 with 𝑏 does not increase cost of the tree. Assume: 𝑔 𝑑1 ≤ 𝑔 𝑏 𝐶 𝑈′ = 𝐷 + 𝑔 𝑑1 ℓ 𝑏 + 𝑔 𝑏 ℓ 𝑑1 𝐶 𝑈 = 𝐷 + 𝑔 𝑑1 ℓ 𝑑1 + 𝑔 𝑏 ℓ 𝑏 𝑝𝑞𝑢 𝑈′ 𝑈 𝑝𝑞𝑢 𝑑 2 𝑑 2 𝑏 𝑑 1 𝑑 1 𝑐 𝑏 32 𝑐
Case 2: are not siblings in • Claim: the least-frequent characters ( 𝑑 1 , 𝑑 2 ), are siblings in some optimal tree 𝑏, 𝑐 = lowest-depth siblings Idea: show that swapping 𝑑 1 with 𝑏 does not increase cost of the tree. Assume: 𝑔 𝑑1 ≤ 𝑔 𝑏 𝐶 𝑈′ = 𝐷 + 𝑔 𝑑1 ℓ 𝑏 + 𝑔 𝑏 ℓ 𝑑1 𝐶 𝑈 = 𝐷 + 𝑔 𝑑1 ℓ 𝑑1 + 𝑔 𝑏 ℓ 𝑏 𝑝𝑞𝑢 ≥ 0 ⇒ 𝑈′ optimal 𝑝𝑞𝑢 − 𝐶 𝑈 ′ = 𝐷 + 𝑔 𝐶 𝑈 𝑑1 ℓ 𝑑1 + 𝑔 𝑏 ℓ 𝑏 − (𝐷 + 𝑔 𝑑1 ℓ 𝑏 + 𝑔 𝑏 ℓ 𝑑1 ) = 𝑔 𝑑1 ℓ 𝑑1 + 𝑔 𝑏 ℓ 𝑏 − 𝑔 𝑑1 ℓ 𝑏 − 𝑔 𝑏 ℓ 𝑑1 = 𝑔 𝑑1 (ℓ 𝑑1 − ℓ 𝑏 ) + 𝑔 𝑏 (ℓ 𝑏 − ℓ 𝑑1 ) = (𝑔 𝑏 −𝑔 𝑑1 )(ℓ 𝑏 − ℓ 𝑑1 ) 33
Case 2: are not siblings in • Claim: the least-frequent characters ( 𝑑 1 , 𝑑 2 ), are siblings in some optimal tree 𝑏, 𝑐 = lowest-depth siblings Idea: show that swapping 𝑑 1 with 𝑏 does not increase cost of the tree. Assume: 𝑔 𝑑1 ≤ 𝑔 𝑏 𝐶 𝑈′ = 𝐷 + 𝑔 𝑑1 ℓ 𝑏 + 𝑔 𝑏 ℓ 𝑑1 𝐶 𝑈 = 𝐷 + 𝑔 𝑑1 ℓ 𝑑1 + 𝑔 𝑏 ℓ 𝑏 𝑝𝑞𝑢 𝑈′ 𝑈 𝑝𝑞𝑢 𝑑 2 𝑑 2 𝑝𝑞𝑢 − 𝐶 𝑈 ′ = (𝑔 𝐶 𝑈 𝑏 −𝑔 𝑑1 )(ℓ 𝑏 − ℓ 𝑑1 ) ≥ 0 ≥ 0 𝑏 𝑑 1 𝑝𝑞𝑢 − 𝐶 𝑈 ′ ≥ 0 𝐶 𝑈 𝑑 1 𝑐 𝑈′ is also optimal! 𝑏 34 𝑐
Recommend
More recommend