compression information and entropy huffman s coding
play

Compression, Information and Entropy Huffmans coding Lecture 22 - PowerPoint PPT Presentation

CS 573: Algorithms, Fall 2014 Compression, Information and Entropy Huffmans coding Lecture 22 November 11, 2014 Sariel (UIUC) CS573 1 Fall 2014 1 / 24 Part I Huffman coding Sariel (UIUC) CS573 2 Fall 2014 2 / 24 Codes... :


  1. Codes... Encoding: given frequency table: 1 f [1 . . . n ] . f [ i ] : frequency of i th character. 2 code( i ) : binary string for i th character. 3 len( s ) : length (in bits) of binary string s . Compute tree T that minimizes 4 n � cost( T ) = f [ i ] ∗ len(code( i )) , (1) i =1 Sariel (UIUC) CS573 5 Fall 2014 5 / 24

  2. Codes... Encoding: given frequency table: 1 f [1 . . . n ] . f [ i ] : frequency of i th character. 2 code( i ) : binary string for i th character. 3 len( s ) : length (in bits) of binary string s . Compute tree T that minimizes 4 n � cost( T ) = f [ i ] ∗ len(code( i )) , (1) i =1 Sariel (UIUC) CS573 5 Fall 2014 5 / 24

  3. Frequency table for... “A tale of two cities” by Dickens \ n 16,492 ‘1’ 61 ‘C’ 13,896 ‘Q’ 667 ’ ’ 130,376 ‘2’ 10 ‘D’ 28,041 ‘R’ 37,187 ‘!’ 955 ‘3’ 12 ‘E’ 74,809 ‘S’ 37,575 ‘”’ 5,681 ‘4’ 10 ‘F’ 13,559 ‘T’ 54,024 ‘$’ 2 ‘5’ 14 ‘G’ 12,530 ‘U’ 16,726 ‘%’ 1 ‘6’ 11 ‘H’ 38,961 ‘V’ 5,199 ‘” 1,174 ‘7’ 13 ‘I’ 41,005 ‘W’ 14,113 ‘(’ 151 ‘8’ 13 ‘J’ 710 ‘X’ 724 ‘)’ 151 ‘9’ 14 ‘K’ 4,782 ‘Y’ 12,177 ‘*’ 70 ‘:’ 267 ‘L’ 22,030 ‘Z’ 215 ‘,’ 13,276 ‘;’ 1,108 ‘M’ 15,298 ‘ ’ 182 ‘–’ 2,430 ‘?’ 913 ‘N’ 42,380 ’‘’ 93 ‘.’ 6,769 ‘A’ 48,165 ‘O’ 46,499 ‘@’ 2 ‘0’ 20 ‘B’ 8,414 ‘P’ 9,957 ‘/’ 26 Sariel (UIUC) CS573 6 Fall 2014 6 / 24

  4. Computed prefix codes... char code char code frequency freq ‘A’ 48165 1110 ‘N’ 42380 1100 ‘B’ 8414 101000 ‘O’ 46499 1101 ‘C’ 13896 00100 ‘P’ 9957 101001 ‘D’ 28041 0011 ‘Q’ 667 1111011001 ‘E’ 74809 011 ‘R’ 37187 0101 ‘F’ 13559 111111 ‘S’ 37575 1000 ‘G’ 12530 111110 ‘T’ 54024 000 ‘H’ 38961 1001 ‘U’ 16726 01001 ‘V’ 5199 1111010 ‘I’ 41005 1011 ‘J’ 710 1111011010 ‘W’ 14113 00101 ‘K’ 4782 11110111 ‘X’ 724 1111011011 ‘L’ 22030 10101 ‘Y’ 12177 111100 ‘M’ 15298 01000 ‘Z’ 215 1111011000 Sariel (UIUC) CS573 7 Fall 2014 7 / 24

  5. The Huffman tree generating the code Build only on A-Z for clarity. • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . • • . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • • . . . . • . . . . . . . . . . . . • • . . . . . . . . . . . . • T . . . . . . . . . . E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . • . . . . • . . . . . . . . . . . . . . . . . . . . • . . . . D . . . . R S H . . . . I N O A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . • . . . . . . . . . . . . . . . . . . • C W M U . . . . L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • B P Y . . . . . G F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Z Q J X Sariel (UIUC) CS573 8 Fall 2014 8 / 24

  6. Mergeablity of code trees two trees for some disjoint parts of the alphabet... 1 Merge into larger tree by creating a new node and hanging the 2 trees from this common node. • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M U ⇒ . . . . . . . . . . 3 M U ...put together two subtrees. 4 • A B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ⇒ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sariel (UIUC) CS573 9 Fall 2014 9 / 24

  7. Mergeablity of code trees two trees for some disjoint parts of the alphabet... 1 Merge into larger tree by creating a new node and hanging the 2 trees from this common node. • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M U ⇒ . . . . . . . . . . 3 M U ...put together two subtrees. 4 • A B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ⇒ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sariel (UIUC) CS573 9 Fall 2014 9 / 24

  8. Mergeablity of code trees two trees for some disjoint parts of the alphabet... 1 Merge into larger tree by creating a new node and hanging the 2 trees from this common node. • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M U ⇒ . . . . . . . . . . 3 M U ...put together two subtrees. 4 • A B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ⇒ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sariel (UIUC) CS573 9 Fall 2014 9 / 24

  9. Mergeablity of code trees two trees for some disjoint parts of the alphabet... 1 Merge into larger tree by creating a new node and hanging the 2 trees from this common node. • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M U ⇒ . . . . . . . . . . 3 M U ...put together two subtrees. 4 • A B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ⇒ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . . . . . . . . B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sariel (UIUC) CS573 9 Fall 2014 9 / 24

  10. Building optimal prefix code trees take two least frequent characters in frequency table... 1 ... merge them into a tree, and put the root of merged tree back 2 into table. ...instead of the two old trees. 3 Algorithm stops when there is a single tree. 4 Intuition: infrequent characters participate in a large number of 5 merges. Long code words. Algorithm is due to David Huffman (1952). 6 Resulting code is best one can do. 7 Huffman coding : building block used by numerous other 8 compression algorithms. Sariel (UIUC) CS573 10 Fall 2014 10 / 24

  11. Building optimal prefix code trees take two least frequent characters in frequency table... 1 ... merge them into a tree, and put the root of merged tree back 2 into table. ...instead of the two old trees. 3 Algorithm stops when there is a single tree. 4 Intuition: infrequent characters participate in a large number of 5 merges. Long code words. Algorithm is due to David Huffman (1952). 6 Resulting code is best one can do. 7 Huffman coding : building block used by numerous other 8 compression algorithms. Sariel (UIUC) CS573 10 Fall 2014 10 / 24

  12. Building optimal prefix code trees take two least frequent characters in frequency table... 1 ... merge them into a tree, and put the root of merged tree back 2 into table. ...instead of the two old trees. 3 Algorithm stops when there is a single tree. 4 Intuition: infrequent characters participate in a large number of 5 merges. Long code words. Algorithm is due to David Huffman (1952). 6 Resulting code is best one can do. 7 Huffman coding : building block used by numerous other 8 compression algorithms. Sariel (UIUC) CS573 10 Fall 2014 10 / 24

  13. Building optimal prefix code trees take two least frequent characters in frequency table... 1 ... merge them into a tree, and put the root of merged tree back 2 into table. ...instead of the two old trees. 3 Algorithm stops when there is a single tree. 4 Intuition: infrequent characters participate in a large number of 5 merges. Long code words. Algorithm is due to David Huffman (1952). 6 Resulting code is best one can do. 7 Huffman coding : building block used by numerous other 8 compression algorithms. Sariel (UIUC) CS573 10 Fall 2014 10 / 24

  14. Building optimal prefix code trees take two least frequent characters in frequency table... 1 ... merge them into a tree, and put the root of merged tree back 2 into table. ...instead of the two old trees. 3 Algorithm stops when there is a single tree. 4 Intuition: infrequent characters participate in a large number of 5 merges. Long code words. Algorithm is due to David Huffman (1952). 6 Resulting code is best one can do. 7 Huffman coding : building block used by numerous other 8 compression algorithms. Sariel (UIUC) CS573 10 Fall 2014 10 / 24

  15. Building optimal prefix code trees take two least frequent characters in frequency table... 1 ... merge them into a tree, and put the root of merged tree back 2 into table. ...instead of the two old trees. 3 Algorithm stops when there is a single tree. 4 Intuition: infrequent characters participate in a large number of 5 merges. Long code words. Algorithm is due to David Huffman (1952). 6 Resulting code is best one can do. 7 Huffman coding : building block used by numerous other 8 compression algorithms. Sariel (UIUC) CS573 10 Fall 2014 10 / 24

  16. Building optimal prefix code trees take two least frequent characters in frequency table... 1 ... merge them into a tree, and put the root of merged tree back 2 into table. ...instead of the two old trees. 3 Algorithm stops when there is a single tree. 4 Intuition: infrequent characters participate in a large number of 5 merges. Long code words. Algorithm is due to David Huffman (1952). 6 Resulting code is best one can do. 7 Huffman coding : building block used by numerous other 8 compression algorithms. Sariel (UIUC) CS573 10 Fall 2014 10 / 24

  17. Building optimal prefix code trees take two least frequent characters in frequency table... 1 ... merge them into a tree, and put the root of merged tree back 2 into table. ...instead of the two old trees. 3 Algorithm stops when there is a single tree. 4 Intuition: infrequent characters participate in a large number of 5 merges. Long code words. Algorithm is due to David Huffman (1952). 6 Resulting code is best one can do. 7 Huffman coding : building block used by numerous other 8 compression algorithms. Sariel (UIUC) CS573 10 Fall 2014 10 / 24

  18. Lemma: lowest leafs are siblings... Lemma T : optimal code tree (prefix free!). 1 Then T is a full binary tree. 2 ... every node of T has either 0 or 2 children. 3 If height of T is d , then there are leafs nodes of height d that 4 are sibling. Sariel (UIUC) CS573 11 Fall 2014 11 / 24

  19. Lemma: lowest leafs are siblings... Lemma T : optimal code tree (prefix free!). 1 Then T is a full binary tree. 2 ... every node of T has either 0 or 2 children. 3 If height of T is d , then there are leafs nodes of height d that 4 are sibling. Sariel (UIUC) CS573 11 Fall 2014 11 / 24

  20. Lemma: lowest leafs are siblings... Lemma T : optimal code tree (prefix free!). 1 Then T is a full binary tree. 2 ... every node of T has either 0 or 2 children. 3 If height of T is d , then there are leafs nodes of height d that 4 are sibling. Sariel (UIUC) CS573 11 Fall 2014 11 / 24

  21. Lemma: lowest leafs are siblings... Lemma T : optimal code tree (prefix free!). 1 Then T is a full binary tree. 2 ... every node of T has either 0 or 2 children. 3 If height of T is d , then there are leafs nodes of height d that 4 are sibling. Sariel (UIUC) CS573 11 Fall 2014 11 / 24

  22. Proof... If ∃ internal node v ∈ V ( T ) with single child... 1 ...remove it. New code tree is better compressor: 2 cost( T ) = � n i =1 f [ i ] ∗ len(code( i )) . u : leaf u with maximum depth d in T . Consider parent 3 v = p( u ) . = ⇒ v : has two children, both leafs 4 Sariel (UIUC) CS573 12 Fall 2014 12 / 24

  23. Proof... If ∃ internal node v ∈ V ( T ) with single child... 1 ...remove it. New code tree is better compressor: 2 cost( T ) = � n i =1 f [ i ] ∗ len(code( i )) . u : leaf u with maximum depth d in T . Consider parent 3 v = p( u ) . = ⇒ v : has two children, both leafs 4 Sariel (UIUC) CS573 12 Fall 2014 12 / 24

  24. Proof... If ∃ internal node v ∈ V ( T ) with single child... 1 ...remove it. New code tree is better compressor: 2 cost( T ) = � n i =1 f [ i ] ∗ len(code( i )) . u : leaf u with maximum depth d in T . Consider parent 3 v = p( u ) . = ⇒ v : has two children, both leafs 4 Sariel (UIUC) CS573 12 Fall 2014 12 / 24

  25. Proof... If ∃ internal node v ∈ V ( T ) with single child... 1 ...remove it. New code tree is better compressor: 2 cost( T ) = � n i =1 f [ i ] ∗ len(code( i )) . u : leaf u with maximum depth d in T . Consider parent 3 v = p( u ) . = ⇒ v : has two children, both leafs 4 Sariel (UIUC) CS573 12 Fall 2014 12 / 24

  26. Proof... If ∃ internal node v ∈ V ( T ) with single child... 1 ...remove it. New code tree is better compressor: 2 cost( T ) = � n i =1 f [ i ] ∗ len(code( i )) . u : leaf u with maximum depth d in T . Consider parent 3 v = p( u ) . = ⇒ v : has two children, both leafs 4 Sariel (UIUC) CS573 12 Fall 2014 12 / 24

  27. Proof... If ∃ internal node v ∈ V ( T ) with single child... 1 ...remove it. New code tree is better compressor: 2 cost( T ) = � n i =1 f [ i ] ∗ len(code( i )) . u : leaf u with maximum depth d in T . Consider parent 3 v = p( u ) . = ⇒ v : has two children, both leafs 4 Sariel (UIUC) CS573 12 Fall 2014 12 / 24

  28. Infrequent characters are stuck together... Lemma x , y : two least frequent characters (breaking ties arbitrarily). ∃ optimal code tree in which x and y are siblings. Sariel (UIUC) CS573 13 Fall 2014 13 / 24

  29. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  30. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  31. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  32. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  33. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  34. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  35. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  36. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  37. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  38. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  39. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  40. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  41. Proof... Claim: ∃ optimal code s.t. x and y are siblings + deepest. 1 T : optimal code tree with depth d . 2 By lemma... T has two leafs at depth d that are siblings, 3 If not x and y , but some other characters α and β . 4 T ′ : swap x and α . 5 x depth inc by ∆ , and depth of α decreases by ∆ . 6 � � cost( T ′ ) = cost( T ) − f [ α ] − f [ x ] ∆ . 7 x : one of the two least frequent characters. 8 ...but α is not. = ⇒ f [ α ] ≥ f [ x ] . 9 10 Swapping x and α does not increase cost. 11 T : optimal code tree, swapping x and α does not decrease cost. 12 T ′ is also an optimal code tree 13 Must be that f [ α ] = f [ x ] . Sariel (UIUC) CS573 14 Fall 2014 14 / 24

  42. Proof continued... y : second least frequent character. 1 β : lowest leaf in tree. Sibling to x . 2 Swapping y and β must give yet another optimal code tree. 3 Final opt code tree, x , y are max-depth siblings. 4 Sariel (UIUC) CS573 15 Fall 2014 15 / 24

  43. Proof continued... y : second least frequent character. 1 β : lowest leaf in tree. Sibling to x . 2 Swapping y and β must give yet another optimal code tree. 3 Final opt code tree, x , y are max-depth siblings. 4 Sariel (UIUC) CS573 15 Fall 2014 15 / 24

  44. Proof continued... y : second least frequent character. 1 β : lowest leaf in tree. Sibling to x . 2 Swapping y and β must give yet another optimal code tree. 3 Final opt code tree, x , y are max-depth siblings. 4 Sariel (UIUC) CS573 15 Fall 2014 15 / 24

  45. Proof continued... y : second least frequent character. 1 β : lowest leaf in tree. Sibling to x . 2 Swapping y and β must give yet another optimal code tree. 3 Final opt code tree, x , y are max-depth siblings. 4 Sariel (UIUC) CS573 15 Fall 2014 15 / 24

  46. Huffman’s codes are optimal Theorem Huffman codes are optimal prefix-free binary codes. Sariel (UIUC) CS573 16 Fall 2014 16 / 24

  47. Proof... If message has 1 or 2 diff characters, then theorem easy. 1 f [1 . . . n ] be original input frequencies. 2 Assume f [1] and f [2] are the two smallest. 3 Let f [ n + 1] = f [1] + f [2] . 4 lemma = ⇒ ∃ opt. code tree T opt for f [1 .. n ] 5 T opt has 1 and 2 as siblings. 6 Remove 1 and 2 from T opt . 7 T ′ opt : Remaining tree has 3 , . . . , n as leafs and “special” 8 character n + 1 (i.e., parent 1 , 2 in T opt ) Sariel (UIUC) CS573 17 Fall 2014 17 / 24

  48. Proof... If message has 1 or 2 diff characters, then theorem easy. 1 f [1 . . . n ] be original input frequencies. 2 Assume f [1] and f [2] are the two smallest. 3 Let f [ n + 1] = f [1] + f [2] . 4 lemma = ⇒ ∃ opt. code tree T opt for f [1 .. n ] 5 T opt has 1 and 2 as siblings. 6 Remove 1 and 2 from T opt . 7 T ′ opt : Remaining tree has 3 , . . . , n as leafs and “special” 8 character n + 1 (i.e., parent 1 , 2 in T opt ) Sariel (UIUC) CS573 17 Fall 2014 17 / 24

  49. Proof... If message has 1 or 2 diff characters, then theorem easy. 1 f [1 . . . n ] be original input frequencies. 2 Assume f [1] and f [2] are the two smallest. 3 Let f [ n + 1] = f [1] + f [2] . 4 lemma = ⇒ ∃ opt. code tree T opt for f [1 .. n ] 5 T opt has 1 and 2 as siblings. 6 Remove 1 and 2 from T opt . 7 T ′ opt : Remaining tree has 3 , . . . , n as leafs and “special” 8 character n + 1 (i.e., parent 1 , 2 in T opt ) Sariel (UIUC) CS573 17 Fall 2014 17 / 24

  50. Proof... If message has 1 or 2 diff characters, then theorem easy. 1 f [1 . . . n ] be original input frequencies. 2 Assume f [1] and f [2] are the two smallest. 3 Let f [ n + 1] = f [1] + f [2] . 4 lemma = ⇒ ∃ opt. code tree T opt for f [1 .. n ] 5 T opt has 1 and 2 as siblings. 6 Remove 1 and 2 from T opt . 7 T ′ opt : Remaining tree has 3 , . . . , n as leafs and “special” 8 character n + 1 (i.e., parent 1 , 2 in T opt ) Sariel (UIUC) CS573 17 Fall 2014 17 / 24

  51. Proof... If message has 1 or 2 diff characters, then theorem easy. 1 f [1 . . . n ] be original input frequencies. 2 Assume f [1] and f [2] are the two smallest. 3 Let f [ n + 1] = f [1] + f [2] . 4 lemma = ⇒ ∃ opt. code tree T opt for f [1 .. n ] 5 T opt has 1 and 2 as siblings. 6 Remove 1 and 2 from T opt . 7 T ′ opt : Remaining tree has 3 , . . . , n as leafs and “special” 8 character n + 1 (i.e., parent 1 , 2 in T opt ) Sariel (UIUC) CS573 17 Fall 2014 17 / 24

  52. Proof... If message has 1 or 2 diff characters, then theorem easy. 1 f [1 . . . n ] be original input frequencies. 2 Assume f [1] and f [2] are the two smallest. 3 Let f [ n + 1] = f [1] + f [2] . 4 lemma = ⇒ ∃ opt. code tree T opt for f [1 .. n ] 5 T opt has 1 and 2 as siblings. 6 Remove 1 and 2 from T opt . 7 T ′ opt : Remaining tree has 3 , . . . , n as leafs and “special” 8 character n + 1 (i.e., parent 1 , 2 in T opt ) Sariel (UIUC) CS573 17 Fall 2014 17 / 24

  53. Proof... If message has 1 or 2 diff characters, then theorem easy. 1 f [1 . . . n ] be original input frequencies. 2 Assume f [1] and f [2] are the two smallest. 3 Let f [ n + 1] = f [1] + f [2] . 4 lemma = ⇒ ∃ opt. code tree T opt for f [1 .. n ] 5 T opt has 1 and 2 as siblings. 6 Remove 1 and 2 from T opt . 7 T ′ opt : Remaining tree has 3 , . . . , n as leafs and “special” 8 character n + 1 (i.e., parent 1 , 2 in T opt ) Sariel (UIUC) CS573 17 Fall 2014 17 / 24

  54. Proof... If message has 1 or 2 diff characters, then theorem easy. 1 f [1 . . . n ] be original input frequencies. 2 Assume f [1] and f [2] are the two smallest. 3 Let f [ n + 1] = f [1] + f [2] . 4 lemma = ⇒ ∃ opt. code tree T opt for f [1 .. n ] 5 T opt has 1 and 2 as siblings. 6 Remove 1 and 2 from T opt . 7 T ′ opt : Remaining tree has 3 , . . . , n as leafs and “special” 8 character n + 1 (i.e., parent 1 , 2 in T opt ) Sariel (UIUC) CS573 17 Fall 2014 17 / 24

  55. La proof continued... character n + 1 : has frequency f [ n + 1] . 1 Now, f [ n + 1] = f [1] + f [2] , we have n � cost( T opt ) = f [ i ]depth T opt ( i ) i =1 n +1 � = f [ i ]depth T opt ( i ) + f [1]depth T opt (1) i =3 + f [2]depth T opt (2) − f [ n + 1]depth T opt ( n + 1) � � � � T ′ = cost + f [1] + f [2] depth( T opt ) opt � � − f [1] + f [2] (depth( T opt ) − 1) � � T ′ = cost + f [1] + f [2] . opt Sariel (UIUC) CS573 18 Fall 2014 18 / 24

  56. La proof continued... character n + 1 : has frequency f [ n + 1] . 1 Now, f [ n + 1] = f [1] + f [2] , we have n � cost( T opt ) = f [ i ]depth T opt ( i ) i =1 n +1 � = f [ i ]depth T opt ( i ) + f [1]depth T opt (1) i =3 + f [2]depth T opt (2) − f [ n + 1]depth T opt ( n + 1) � � � � T ′ = cost + f [1] + f [2] depth( T opt ) opt � � − f [1] + f [2] (depth( T opt ) − 1) � � T ′ = cost + f [1] + f [2] . opt Sariel (UIUC) CS573 18 Fall 2014 18 / 24

  57. La proof continued... character n + 1 : has frequency f [ n + 1] . 1 Now, f [ n + 1] = f [1] + f [2] , we have n � cost( T opt ) = f [ i ]depth T opt ( i ) i =1 n +1 � = f [ i ]depth T opt ( i ) + f [1]depth T opt (1) i =3 + f [2]depth T opt (2) − f [ n + 1]depth T opt ( n + 1) � � � � T ′ = cost + f [1] + f [2] depth( T opt ) opt � � − f [1] + f [2] (depth( T opt ) − 1) � � T ′ = cost + f [1] + f [2] . opt Sariel (UIUC) CS573 18 Fall 2014 18 / 24

  58. La proof continued... character n + 1 : has frequency f [ n + 1] . 1 Now, f [ n + 1] = f [1] + f [2] , we have n � cost( T opt ) = f [ i ]depth T opt ( i ) i =1 n +1 � = f [ i ]depth T opt ( i ) + f [1]depth T opt (1) i =3 + f [2]depth T opt (2) − f [ n + 1]depth T opt ( n + 1) � � � � T ′ = cost + f [1] + f [2] depth( T opt ) opt � � − f [1] + f [2] (depth( T opt ) − 1) � � T ′ = cost + f [1] + f [2] . opt Sariel (UIUC) CS573 18 Fall 2014 18 / 24

  59. La proof continued... implies min cost of T opt ≡ min cost T ′ opt . 1 T ′ opt : must be optimal coding tree for f [3 . . . n + 1] . 2 T ′ H : Huffman tree for f [3 , . . . , n + 1] 3 T H : overall Huffman tree constructed for f [1 , . . . , n ] . By construction: 4 T ′ H formed by removing leafs 1 and 2 from T H . By induction: 5 Huffman tree generated for f [3 , . . . , n + 1] is optimal. � � � � T ′ T ′ cost = cost . 6 H opt � � T ′ = ⇒ cost( T H ) = cost + f [1] + f [2] = 7 H � � T ′ cost + f [1] + f [2] = cost( T opt ) , opt = ⇒ Huffman tree has the same cost as the optimal tree. 8 Sariel (UIUC) CS573 19 Fall 2014 19 / 24

  60. La proof continued... implies min cost of T opt ≡ min cost T ′ opt . 1 T ′ opt : must be optimal coding tree for f [3 . . . n + 1] . 2 T ′ H : Huffman tree for f [3 , . . . , n + 1] 3 T H : overall Huffman tree constructed for f [1 , . . . , n ] . By construction: 4 T ′ H formed by removing leafs 1 and 2 from T H . By induction: 5 Huffman tree generated for f [3 , . . . , n + 1] is optimal. � � � � T ′ T ′ cost = cost . 6 H opt � � T ′ = ⇒ cost( T H ) = cost + f [1] + f [2] = 7 H � � T ′ cost + f [1] + f [2] = cost( T opt ) , opt = ⇒ Huffman tree has the same cost as the optimal tree. 8 Sariel (UIUC) CS573 19 Fall 2014 19 / 24

  61. La proof continued... implies min cost of T opt ≡ min cost T ′ opt . 1 T ′ opt : must be optimal coding tree for f [3 . . . n + 1] . 2 T ′ H : Huffman tree for f [3 , . . . , n + 1] 3 T H : overall Huffman tree constructed for f [1 , . . . , n ] . By construction: 4 T ′ H formed by removing leafs 1 and 2 from T H . By induction: 5 Huffman tree generated for f [3 , . . . , n + 1] is optimal. � � � � T ′ T ′ cost = cost . 6 H opt � � T ′ = ⇒ cost( T H ) = cost + f [1] + f [2] = 7 H � � T ′ cost + f [1] + f [2] = cost( T opt ) , opt = ⇒ Huffman tree has the same cost as the optimal tree. 8 Sariel (UIUC) CS573 19 Fall 2014 19 / 24

  62. La proof continued... implies min cost of T opt ≡ min cost T ′ opt . 1 T ′ opt : must be optimal coding tree for f [3 . . . n + 1] . 2 T ′ H : Huffman tree for f [3 , . . . , n + 1] 3 T H : overall Huffman tree constructed for f [1 , . . . , n ] . By construction: 4 T ′ H formed by removing leafs 1 and 2 from T H . By induction: 5 Huffman tree generated for f [3 , . . . , n + 1] is optimal. � � � � T ′ T ′ cost = cost . 6 H opt � � T ′ = ⇒ cost( T H ) = cost + f [1] + f [2] = 7 H � � T ′ cost + f [1] + f [2] = cost( T opt ) , opt = ⇒ Huffman tree has the same cost as the optimal tree. 8 Sariel (UIUC) CS573 19 Fall 2014 19 / 24

  63. La proof continued... implies min cost of T opt ≡ min cost T ′ opt . 1 T ′ opt : must be optimal coding tree for f [3 . . . n + 1] . 2 T ′ H : Huffman tree for f [3 , . . . , n + 1] 3 T H : overall Huffman tree constructed for f [1 , . . . , n ] . By construction: 4 T ′ H formed by removing leafs 1 and 2 from T H . By induction: 5 Huffman tree generated for f [3 , . . . , n + 1] is optimal. � � � � T ′ T ′ cost = cost . 6 H opt � � T ′ = ⇒ cost( T H ) = cost + f [1] + f [2] = 7 H � � T ′ cost + f [1] + f [2] = cost( T opt ) , opt = ⇒ Huffman tree has the same cost as the optimal tree. 8 Sariel (UIUC) CS573 19 Fall 2014 19 / 24

  64. La proof continued... implies min cost of T opt ≡ min cost T ′ opt . 1 T ′ opt : must be optimal coding tree for f [3 . . . n + 1] . 2 T ′ H : Huffman tree for f [3 , . . . , n + 1] 3 T H : overall Huffman tree constructed for f [1 , . . . , n ] . By construction: 4 T ′ H formed by removing leafs 1 and 2 from T H . By induction: 5 Huffman tree generated for f [3 , . . . , n + 1] is optimal. � � � � T ′ T ′ cost = cost . 6 H opt � � T ′ = ⇒ cost( T H ) = cost + f [1] + f [2] = 7 H � � T ′ cost + f [1] + f [2] = cost( T opt ) , opt = ⇒ Huffman tree has the same cost as the optimal tree. 8 Sariel (UIUC) CS573 19 Fall 2014 19 / 24

  65. La proof continued... implies min cost of T opt ≡ min cost T ′ opt . 1 T ′ opt : must be optimal coding tree for f [3 . . . n + 1] . 2 T ′ H : Huffman tree for f [3 , . . . , n + 1] 3 T H : overall Huffman tree constructed for f [1 , . . . , n ] . By construction: 4 T ′ H formed by removing leafs 1 and 2 from T H . By induction: 5 Huffman tree generated for f [3 , . . . , n + 1] is optimal. � � � � T ′ T ′ cost = cost . 6 H opt � � T ′ = ⇒ cost( T H ) = cost + f [1] + f [2] = 7 H � � T ′ cost + f [1] + f [2] = cost( T opt ) , opt = ⇒ Huffman tree has the same cost as the optimal tree. 8 Sariel (UIUC) CS573 19 Fall 2014 19 / 24

  66. La proof continued... implies min cost of T opt ≡ min cost T ′ opt . 1 T ′ opt : must be optimal coding tree for f [3 . . . n + 1] . 2 T ′ H : Huffman tree for f [3 , . . . , n + 1] 3 T H : overall Huffman tree constructed for f [1 , . . . , n ] . By construction: 4 T ′ H formed by removing leafs 1 and 2 from T H . By induction: 5 Huffman tree generated for f [3 , . . . , n + 1] is optimal. � � � � T ′ T ′ cost = cost . 6 H opt � � T ′ = ⇒ cost( T H ) = cost + f [1] + f [2] = 7 H � � T ′ cost + f [1] + f [2] = cost( T opt ) , opt = ⇒ Huffman tree has the same cost as the optimal tree. 8 Sariel (UIUC) CS573 19 Fall 2014 19 / 24

  67. What we get... A tale of two cities: 779,940 bytes. 1 using above Huffman compression results in a compression to a 2 file of size 439,688 bytes. Ignoring space to store tree. 3 gzip : 301,295 bytes 4 bzip2 : 220,156 bytes! Huffman encoder can be easily written in a few hours of work! 5 All later compressors use it as a black box... 6 Sariel (UIUC) CS573 20 Fall 2014 20 / 24

  68. What we get... A tale of two cities: 779,940 bytes. 1 using above Huffman compression results in a compression to a 2 file of size 439,688 bytes. Ignoring space to store tree. 3 gzip : 301,295 bytes 4 bzip2 : 220,156 bytes! Huffman encoder can be easily written in a few hours of work! 5 All later compressors use it as a black box... 6 Sariel (UIUC) CS573 20 Fall 2014 20 / 24

  69. What we get... A tale of two cities: 779,940 bytes. 1 using above Huffman compression results in a compression to a 2 file of size 439,688 bytes. Ignoring space to store tree. 3 gzip : 301,295 bytes 4 bzip2 : 220,156 bytes! Huffman encoder can be easily written in a few hours of work! 5 All later compressors use it as a black box... 6 Sariel (UIUC) CS573 20 Fall 2014 20 / 24

  70. What we get... A tale of two cities: 779,940 bytes. 1 using above Huffman compression results in a compression to a 2 file of size 439,688 bytes. Ignoring space to store tree. 3 gzip : 301,295 bytes 4 bzip2 : 220,156 bytes! Huffman encoder can be easily written in a few hours of work! 5 All later compressors use it as a black box... 6 Sariel (UIUC) CS573 20 Fall 2014 20 / 24

  71. What we get... A tale of two cities: 779,940 bytes. 1 using above Huffman compression results in a compression to a 2 file of size 439,688 bytes. Ignoring space to store tree. 3 gzip : 301,295 bytes 4 bzip2 : 220,156 bytes! Huffman encoder can be easily written in a few hours of work! 5 All later compressors use it as a black box... 6 Sariel (UIUC) CS573 20 Fall 2014 20 / 24

  72. What we get... A tale of two cities: 779,940 bytes. 1 using above Huffman compression results in a compression to a 2 file of size 439,688 bytes. Ignoring space to store tree. 3 gzip : 301,295 bytes 4 bzip2 : 220,156 bytes! Huffman encoder can be easily written in a few hours of work! 5 All later compressors use it as a black box... 6 Sariel (UIUC) CS573 20 Fall 2014 20 / 24

  73. Average size of code word input is made out of n characters. 1 p i : fraction of input that is i th char (probability). 2 use probabilities to build Huffman tree. 3 Q: What is the length of the codewords assigned to characters 4 as function of probabilities? special case... 5 Sariel (UIUC) CS573 21 Fall 2014 21 / 24

  74. Average size of code word input is made out of n characters. 1 p i : fraction of input that is i th char (probability). 2 use probabilities to build Huffman tree. 3 Q: What is the length of the codewords assigned to characters 4 as function of probabilities? special case... 5 Sariel (UIUC) CS573 21 Fall 2014 21 / 24

  75. Average size of code word input is made out of n characters. 1 p i : fraction of input that is i th char (probability). 2 use probabilities to build Huffman tree. 3 Q: What is the length of the codewords assigned to characters 4 as function of probabilities? special case... 5 Sariel (UIUC) CS573 21 Fall 2014 21 / 24

  76. Average size of code word input is made out of n characters. 1 p i : fraction of input that is i th char (probability). 2 use probabilities to build Huffman tree. 3 Q: What is the length of the codewords assigned to characters 4 as function of probabilities? special case... 5 Sariel (UIUC) CS573 21 Fall 2014 21 / 24

  77. Average size of code word input is made out of n characters. 1 p i : fraction of input that is i th char (probability). 2 use probabilities to build Huffman tree. 3 Q: What is the length of the codewords assigned to characters 4 as function of probabilities? special case... 5 Sariel (UIUC) CS573 21 Fall 2014 21 / 24

  78. Average length of codewords... Special case Lemma 1 , . . . , n : symbols. Assume, for i = 1 , . . . , n : p i = 1 / 2 l i : probability for the i th symbol 1 l i ≥ 0 : integer. 2 Then, in Huffman coding for this input, the code for i is of length l i . Sariel (UIUC) CS573 22 Fall 2014 22 / 24

  79. Proof induction of the Huffman algorithm. 1 n = 2 : claim holds since there are only two characters with 2 probability 1 / 2 . Let i and j be the two characters with lowest probability. 3 Must be p i = p j (otherwise, k p k � = 1 ). � 4 Huffman’s tree merges this two letters, into a single “character” 5 that have probability 2 p i . New “character” has encoding of length l i − 1 , by induction 6 (on remaining n − 1 symbols). resulting tree encodes i and j by code words of length 7 ( l i − 1) + 1 = l i . Sariel (UIUC) CS573 23 Fall 2014 23 / 24

  80. Proof induction of the Huffman algorithm. 1 n = 2 : claim holds since there are only two characters with 2 probability 1 / 2 . Let i and j be the two characters with lowest probability. 3 Must be p i = p j (otherwise, k p k � = 1 ). � 4 Huffman’s tree merges this two letters, into a single “character” 5 that have probability 2 p i . New “character” has encoding of length l i − 1 , by induction 6 (on remaining n − 1 symbols). resulting tree encodes i and j by code words of length 7 ( l i − 1) + 1 = l i . Sariel (UIUC) CS573 23 Fall 2014 23 / 24

Recommend


More recommend