Huffman Coding Eric Dubois School of Electrical Engineering and Computer Science University of Ottawa September 2012 Eric Dubois (EECS) Huffman Coding September 2012 1 / 17
The optimal prefix code problem Given a finite alphabet with a given set of probabilities, we want to find a prefix code with the shortest average codeword length. To simplify notation, denote p i = P ( a i ) and ℓ i = ℓ ( a i ), for i = 1 , . . . , M . Without loss of generality, we arrange the symbols in the alphabet so that p 1 ≥ p 2 ≥ · · · ≥ p M . Problem: Find a set of positive integers ℓ 1 , ℓ 2 , . . . , ℓ M such that M � ℓ = p i ℓ i i =1 is minimized, subject to the constraint M 2 − ℓ i ≤ 1 � i =1 The solution may not be unique. Eric Dubois (EECS) Huffman Coding September 2012 2 / 17
Preview of the Huffman algorithm The Huffman algorithm was originally devised by David Huffman, apparently as part of a course assignment at MIT and published in 1951. Consider the following example: M = 8, { p i } = { 0 . 25 , 0 . 2 , 0 . 2 , 0 . 18 , 0 . 09 , 0 . 05 , 0 . 02 , 0 . 01 } . The Huffman procedure constructs the prefix code starting with the last bits of the least probable symbols. List the probabilities in decreasing order in a column on the left. Assign the final bits of the last two codewords. Add the two probabilities to replace the previous two. Select the two lowest probabilities in the reduced list, and assign two bits. Continue until two symbols remain. Read codewords from right to left. Eric Dubois (EECS) Huffman Coding September 2012 3 / 17
Huffman coding example Codewords 0 10 p 1 = 0.25 0 00 p 2 = 0.2 0.4 0 1.0 01 p 3 = 0.2 0.6 1 1 0 p 4 = 0.18 110 0.35 0 p 5 = 0.09 1110 1 0.17 0 p 6 = 0.05 11110 1 0.08 0 111110 p 7 = 0.02 1 0.03 1 111111 p 8 = 0.01 1 Eric Dubois (EECS) Huffman Coding September 2012 4 / 17
Huffman coding example (2) H 1 = − � 8 i =1 p i log 2 ( p i ) = 2 . 5821 ℓ Huff = � 8 i =1 p i ℓ i = 2 . 63 ℓ Shann = 3 . 04 ℓ fixed = 3 Eric Dubois (EECS) Huffman Coding September 2012 5 / 17
Huffman coding example – spreadsheet Shannon code Huffman code p_i ‐ log_2(p_i) ‐ p_i*log_2(p_i) l_i p_i*l_i 2^( ‐ l_i) l_i p_i*l_i 2^( ‐ l_i) 0.25 2.0000 0.5000 2 0.5 0.25 2 0.5 0.25 0.20 2.3219 0.4644 3 0.6 0.125 2 0.4 0.25 0.20 2.3219 0.4644 3 0.6 0.125 2 0.4 0.25 0.18 2.4739 0.4453 3 0.54 0.125 3 0.54 0.125 0.09 3.4739 0.3127 4 0.36 0.0625 4 0.36 0.0625 0.05 4.3219 0.2161 5 0.25 0.03125 5 0.25 0.03125 0.02 5.6439 0.1129 6 0.12 0.015625 6 0.12 0.015625 0.01 6.6439 0.0664 7 0.07 0.0078125 6 0.06 0.015625 sum p_i entropy H_1 l_Shann Kraft l_Huff Kraft 1.00 2.5821 3.0400 0.7422 2.6300 1.0000 Eric Dubois (EECS) Huffman Coding September 2012 6 / 17
Huffman coding example – binary tree 0 1 1 1 0 0 c ( a 3 ) c ( a 1 ) c ( a 2 ) 0 1 c ( a 4 ) 0 1 c ( a 5 ) 0 1 c ( a 6 ) 0 1 c ( a 7 ) c ( a 8 ) Eric Dubois (EECS) Huffman Coding September 2012 7 / 17
Theorem For any admissible set of probabilities, there exists an optimal prefix code satisfying the following properties: 1 If p j > p k , then ℓ j ≤ ℓ k , so that ℓ 1 ≤ ℓ 2 ≤ · · · ≤ ℓ M . 2 The two longest codewords have the same length: ℓ M − 1 = ℓ M . 3 The two longest codewords differ only in their last bit, and correspond to the two source symbols of lowest probability. Note that not all optimal codes need satisfy these properties, but at least one does. Eric Dubois (EECS) Huffman Coding September 2012 8 / 17
Proof (1) Let C be an optimal code with codeword lengths ℓ 1 , . . . , ℓ M , and suppose that contrary to the theorem statement, p j > p k but ℓ j > ℓ k . Let C ′ be a new code with ℓ ′ j = ℓ k , ℓ ′ k = ℓ j , and ℓ ′ i = ℓ i for i � = j , k . Then M M � � ℓ ( C ′ ) − ℓ ( C ) = p i ℓ ′ i − p i ℓ i i =1 i =1 = p j ℓ k + p k ℓ j − p j ℓ j − p k ℓ k = ( p j − p k )( ℓ k − ℓ j ) < 0 which contradicts the assumption that C is an optimal code. Thus ℓ j ≤ ℓ k . Eric Dubois (EECS) Huffman Coding September 2012 9 / 17
Theorem For any admissible set of probabilities, there exists an optimal prefix code satisfying the following properties: 1 If p j > p k , then ℓ j ≤ ℓ k , so that ℓ 1 ≤ ℓ 2 ≤ · · · ≤ ℓ M . 2 The two longest codewords have the same length: ℓ M − 1 = ℓ M . 3 The two longest codewords differ only in their last bit, and correspond to the two source symbols of lowest probability. Eric Dubois (EECS) Huffman Coding September 2012 10 / 17
Proof (2) Suppose that ℓ M > ℓ M − 1 . Thus no other codeword will be of length ℓ M . Since C is a prefix code, we can remove the last bit of c ( a M ) and the new code will still be a prefix code, but of lower average codeword length ( ℓ ( C ) − p M ). Again, this contradicts the assumption that C is an optimal code, so ℓ M − 1 = ℓ M . Eric Dubois (EECS) Huffman Coding September 2012 11 / 17
Theorem For any admissible set of probabilities, there exists an optimal prefix code satisfying the following properties: 1 If p j > p k , then ℓ j ≤ ℓ k , so that ℓ 1 ≤ ℓ 2 ≤ · · · ≤ ℓ M . 2 The two longest codewords have the same length: ℓ M − 1 = ℓ M . 3 The two longest codewords differ only in their last bit, and correspond to the two source symbols of lowest probability. Eric Dubois (EECS) Huffman Coding September 2012 12 / 17
Proof (3) For all the codewords of length ℓ M , there is another codeword that differs only in the last bit. Otherwise, we could remove the last bit as in (2) and reduce the average codeword length. If the codeword that differs from c ( a M ) in the last bit is not c ( a M − 1 ) but rather c ( a j ) for some other j , we can exchange the codewords for a M − 1 and a j without changing the average codeword length, and the code would remain optimal. The Huffman algorithm is a recursive procedure to find a code satisfying the properties of the theorem. Eric Dubois (EECS) Huffman Coding September 2012 13 / 17
Huffman code scenario Optimal code tree c M ( a M -1 ) c M ( a M ) Eric Dubois (EECS) Huffman Coding September 2012 14 / 17
Recursive Algorithm Assume that we have an optimal code C M for the alphabet A = { a 1 , . . . . a M } with probabilities P ( a i ) satisfying the properties of the theorem. Form the reduced alphabet A ′ = { a ′ 1 , . . . , a ′ M − 1 } with probabilities P ( a ′ i ) = P ( a i ) , i = 1 , . . . , M − 2 and P ( a ′ M − 1 ) = P ( a M − 1 ) + P ( a M ). Suppose that we have a prefix code C M − 1 for the reduced alphabet satisfying c M ( a i ) = c M − 1 ( a ′ i ) , i = 1 , . . . , M − 2, c M ( a M − 1 ) = c M − 1 ( a ′ M − 1 ) ∗ 0 and c M ( a M ) = c M − 1 ( a ′ M − 1 ) ∗ 1 Then ℓ i = ℓ ′ i , i = 1 , . . . , M − 2 and ℓ M − 1 = ℓ M = ℓ ′ M − 1 + 1. M M − 2 � � ℓ ( C M ) = P ( a i ) ℓ i = P ( a ′ i ) ℓ ′ i + ( P ( a M − 1 ) + P ( a M ))( ℓ ′ M − 1 + 1) i =1 i =1 M − 1 � P ( a ′ i ) ℓ ′ = i + P ( a M − 1 ) + P ( a M ) i =1 = ℓ ( C M − 1 ) + P ( a M − 1 ) + P ( a M ) Eric Dubois (EECS) Huffman Coding September 2012 15 / 17
Huffman code scenario (2) Reduced code tree c M -1 ( a’ M -1 ) Eric Dubois (EECS) Huffman Coding September 2012 16 / 17
Recursive Algorithm (2) Conclusion: C M is an optimal code for {A , P ( a i ) } if and only if C M − 1 is an optimal code for {A ′ , P ( a ′ i ) } Similarly, we can obtain C M − 2 from C M − 1 . We continue until we reach C 2 for an alphabet with two symbols, where the only possible code has codewords 0 and 1. This results in the Huffman procedure illustrated by the earlier example. Note that H 1 ≤ ℓ Huff ≤ ℓ Shann < H 1 + 1. Eric Dubois (EECS) Huffman Coding September 2012 17 / 17
Recommend
More recommend