objectives
play

Objectives Review Huffman Codes Introducing Divide and Conquer - PDF document

3/6/19 Objectives Review Huffman Codes Introducing Divide and Conquer Algorithms March 6, 2019 CSCI211 - Sprenkle 1 Towards Huffman Codes What problem are we trying to solve? Binary tree rules: Each leaf node is a letter


  1. 3/6/19 Objectives • Review Huffman Codes • Introducing Divide and Conquer Algorithms March 6, 2019 CSCI211 - Sprenkle 1 Towards Huffman Codes • What problem are we trying to solve? • Binary tree rules: Ø Each leaf node is a letter Ø Follow path to the letter • Going left: 0 • Going right: 1 Given the mapping, how do you build the binary tree for this mapping? March 6, 2019 CSCI211 - Sprenkle 2 1

  2. 3/6/19 Recursively Generate Tree • All letters are in root node • For all letters in node Ø If encoding begins with 0, letter belongs in left subtree Ø Otherwise (encoding begins with 1), letter belongs in right subtree Ø If last bit of encoding, make the letter a leaf node of that subtree Ø Shift encoding one bit Ø Process left and right children March 6, 2019 CSCI211 - Sprenkle 3 Tree Properties • What is the length of a letter’s encoding? • Define our optimal goal in tree terms March 6, 2019 CSCI211 - Sprenkle 4 2

  3. 3/6/19 Tree Properties • What is the length of a letter’s encoding? Ø Length of path from root to leaf à its depth • Define our optimal goal in tree terms Ø ABL = Σ x ∈ S f x |γ(x)| = Σ x ∈ S f x depth(x) March 6, 2019 CSCI211 - Sprenkle 5 Tree Properties • What do we want our tree to look like for the optimal solution? Ø How many leaves? Ø How many internal nodes? • Think about parent nodes vs. child nodes Ø When uniform frequencies? Ø Nonuniform frequencies? March 6, 2019 CSCI211 - Sprenkle 6 3

  4. 3/6/19 Tree Properties • Claim. The binary tree T corresponding to the optimal prefix code is full , i.e., each internal node has two children. • Proof? March 6, 2019 CSCI211 - Sprenkle 7 Tree Properties • Claim. The binary tree T corresponding to the optimal prefix code is full , i.e., each internal node has two children. • Proof. Assume that T has an internal node with only one child Ø Without loss of generality, assume left child u u ? ? v: v root of Subtree March 6, 2019 CSCI211 - Sprenkle 8 4

  5. 3/6/19 Tree Properties • Claim. The binary tree T corresponding to the optimal prefix code is full , i.e., each internal node has two children. • Proof. Assume that T has an internal node with only one child u u v v: root of v: Subtree root of v Subtree Replace u with v à decrease depth à original wasn’t optimal March 6, 2019 CSCI211 - Sprenkle 9 Toward a Solution… • Two problems to solve: Ø Creating the prefix code tree Ø Labeling the prefix code tree with alphabet/frequencies March 6, 2019 CSCI211 - Sprenkle 10 5

  6. 3/6/19 Simplifying: Know Optimal Prefix Code • Process: assume knowledge of optimal solution to gain insight into finding solution • Assume we knew the tree structure of the optimal prefix code, how would you label the leaf nodes? frequency Increasing March 6, 2019 CSCI211 - Sprenkle 11 Combining Our Conclusions • The binary tree corresponding to the optimal prefix code is full , i.e., each internal node has two children • We want to label the leaf nodes of the binary tree corresponding to the optimal prefix code such that nodes with greatest depth have least frequency What does this mean the bottom of our tree should look like? March 6, 2019 CSCI211 - Sprenkle 12 6

  7. 3/6/19 Combining Our Conclusions • The binary tree corresponding to the optimal prefix code is full , i.e., each internal node has two children • We want to label the leaf nodes of the binary tree corresponding to the optimal prefix code such that nodes with greatest depth have least frequency What does this mean the bottom of our tree should look like? 2 letters with least f n f n-1 frequency: Could be flipped March 6, 2019 CSCI211 - Sprenkle 13 How Can We Use This? • Two letters with least frequency are definitely going to be siblings Ø Tie them together Ø Their parent is a “meta-letter” • Frequency is sum of f n + f n-1 Meta-letter: f n + f n-1 2 letters with f n f n-1 least frequency: Could be flipped March 6, 2019 CSCI211 - Sprenkle 14 7

  8. 3/6/19 Constructing an Optimal Prefix Code Huffman’s Algorithm: To construct a prefix code for an alphabet S with given frequencies: if S has two letters: Encode one letter as 0 and the other letter as 1 Replace lowest-freq letters with meta letter else: Let y* and z* be the two lowest-frequency letters Form a new alphabet S’ by deleted y* and z* and replacing e c u them with a new letter w of freq f y* + f z* d e Recursively construct a prefix code y’ for S’ with tree T’ R Define a prefix code for S as follows: Build up Start with T’ Take the leaf labeled w and add two children below it labeled y* and z* March 6, 2019 CSCI211 - Sprenkle 15 Constructing an Optimal Prefix Code: Alternative Description 1. Create a leaf node for each symbol, labeled by its frequency, and add to a queue 2. While there is more than one node in the queue a) Remove the two nodes of lowest frequency b) Create a new internal node with these two nodes as children and with frequency equal to the sum of the two nodes' probabilities c) Add the new node to the queue 3. The remaining node is the tree’s root node March 6, 2019 CSCI211 - Sprenkle 16 8

  9. 3/6/19 Creating the Optimal Prefix Code f a = .32 f b = .25 f c = .20 f d = .18 f e = .05 March 6, 2019 CSCI211 - Sprenkle 17 Creating the Optimal Prefix Code f a = .32 f b = .25 f c = .20 f d = .18 Lowest frequencies Merge f e = .05 de= .23 d e a b c March 6, 2019 CSCI211 - Sprenkle 18 9

  10. 3/6/19 Creating the Optimal Prefix Code f a = .32 f b = .25 f c = .20 Lowest frequencies Merge f de = .23 cde= .43 de= c .23 d e a b March 6, 2019 CSCI211 - Sprenkle 19 Creating the Optimal Prefix Code f a = .32 Lowest frequencies Merge f b = .25 f cde = .43 ab= cde= .57 .43 de= a b c .23 d e March 6, 2019 CSCI211 - Sprenkle 20 10

  11. 3/6/19 Creating the Optimal Prefix Code f a = .32 f b = .25 f ab = .57 Lowest frequencies Merge f c = .20 f cde = .43 abcde =1 f d = .18 f e = .05 ab= cde= .57 .43 de= a b c .23 d e What are the resulting encodings? What is the ABL? March 6, 2019 CSCI211 - Sprenkle 21 Creating the Optimal Prefix Code f a = .32 a: 00 f b = .25 b: 01 abcde 1 c: 10 f c = .20 0 =1 d: 110 f d = .18 e: 111 ab= f e = .05 cde= .57 .43 0 1 1 0 de= a b c .23 0 1 ABL=.32*2 + .25*2 + .20*2 + .18*3 + .05*3 d e = .64 + .5 + .4 + .54 + .15 = 2.23 I chose to build the tree this way. What if I had switched the order of the children? March 6, 2019 CSCI211 - Sprenkle 22 11

  12. 3/6/19 Implementation • What data structures do we need? March 6, 2019 CSCI211 - Sprenkle 23 Implementation • What data structures do we need? Ø Binary tree for the prefix codes Ø Priority queue for choosing the node with lowest frequency • Where are the costs? March 6, 2019 CSCI211 - Sprenkle 24 12

  13. 3/6/19 Running Time • Costs Ø Inserting and extracting node into PQ: O(log n) Ø Number of insertions and extractions: O(n) Ø O(n log n) March 6, 2019 CSCI211 - Sprenkle 25 Analysis of Algorithm’s Optimality • 2 page proof in book March 6, 2019 CSCI211 - Sprenkle 26 13

  14. 3/6/19 Real-life Compression • Text can be compressed well because of known frequencies • Algorithms can be optimized to languages Ø More than just “z doesn’t happen very often” • “z doesn’t happen after q” March 6, 2019 CSCI211 - Sprenkle 27 DIVIDE AND CONQUER ALGORITHMS March 6, 2019 CSCI211 - Sprenkle 28 14

  15. 3/6/19 Divide et impera. Divide-and-Conquer Veni, vidi, vici. - Julius Caesar • Divide-and-conquer process Ø Break up problem into several parts Ø Solve each part recursively Ø Combine solutions to sub-problems into overall solution • Most common usage: Ø Break up problem of size n into two equal parts of size ½n Ø Solve two parts recursively Ø Combine two solutions into overall solution March 6, 2019 CSCI211 - Sprenkle 29 Discussion • What is a well-known divide and conquer algorithm? Merge Sort March 6, 2019 CSCI211 - Sprenkle 30 15

  16. 3/6/19 Merge Sort • How does Merge Sort work? • When do we stop? March 6, 2019 CSCI211 - Sprenkle 31 Merge Sort Divide list into two lists Until only 2 elements Sort elements Combine sorted lists (how?) March 6, 2019 CSCI211 - Sprenkle 32 16

  17. 3/6/19 RECURRENCE RELATIONS March 6, 2019 CSCI211 - Sprenkle 33 Analyzing Merge Sort General Template • Break up problem of size n into two equal parts of size ½ n • Solve two parts recursively • Combine two solutions into overall solution • Def. T(n) = number of comparisons to mergesort an input of size n • Want to say a bit more about what T(n) is Ø Break it down more… What can we say about the running time w.r.t. to the different parts of the above template? March 6, 2019 CSCI211 - Sprenkle 34 17

Recommend


More recommend