Topic 20: Huffman Coding Agenda The author should gaze at Noah, and ... Encoding learn, as they did in the Ark, to crowd a great deal of matter into a very small Compression compass. Huffman Coding Sydney Smith, Edinburgh Review 2 Encoding ASCII - UNICODE UTCS 85 84 67 83 01010101 01010100 01000011 01010011 what is a file? open a bitmap in a text editor 3 4
Text File Text File??? 5 6 Bitmap File Bitmap File???? 7 8
JPEG File JPEG VS BITMAP JPEG File 9 10 Encoding Schemes Altering files Tower bit map (Eclipse/Huffman/Data). "It's all 1s and 0s" Alter the first 300 characters of line What do the 1s and 0s mean? ~00~00~00~00~00~00~00~00~00~00~00~00~00 50 121 109 ~00~00~00~00~00~00~00~00~00~00~00~00~00 ASCII -> 2ym ~00~00~00~00~00~00~00~00~00~00~00~00~00 Red Green Blue-> ~00~00~00~00~00~00~00~00~00~00~00~00~00 dark teal? ~00~00~00~00~00~00~00~00~00~00~00~00~00 ~00~00~00~00~00~00~00~00~00~00~00~00~00 ~00~00~00~00~00~00~00~00~00~00~00~00~00 ~00~00~00~00~00~00~00~00~00 11 12
Agenda Compression Compression: Storing the same information Encoding but in a form that takes less memory Compression lossless and lossy compression Huffman Coding Recall: 13 14 Lossy Artifacts Why Bother? Is compression really necessary? 5 Terabytes 1250 HD, 2 hour movies or 1,250,000 songs 15 16 Price? About $110.00
Clicker 1 Little Pipes and Big Pumps Home Internet Access CPU Capability With storage so cheap, is compression 400 Mbps roughly $70 $1,500 for a laptop or really necessary? per month desktop 12 months * 3 years * $70 -7900X A. No = $1,440 Assume it lasts 3 years. B. Yes 400,000,000 bits /second Memory bandwidth = 5 * 10 7 bytes / sec 040 GB / sec C. It Depends = 4.0 * 10 10 bytes / sec on the order of 6.4 * 10 11 instructions / second 17 18 Little Pipes and Big Pumps Mobile Devices? Cellular Network iPhone CPU CPU Apple A6 System on a Chip Mega bits per second Coy about IPS AT&T 2 cores 17 mbps download, 7 mbps Data In upload Rough estimates: From Network 1 x 10 10 instructions per T-Mobile & Verizon 12 mbps download, 7 mbps second upload 17,000,000 bits per second = 2.125 x 10 6 bytes per second 19 20 http://tinyurl.com/q6o7wan
Compression - Why Bother? Agenda Apostolos "Toli" Lerios Facebook Engineer Encoding Heads image storage group Compression jpeg images already Huffman Coding compressed look for ways to compress even more 1% less space = millions of dollars in savings 21 22 Purpose of Huffman Coding The Basic Algorithm Huffman coding is a form of statistical coding Proposed by Dr. David A. Huffman Not all characters occur with the same A Method for the Construction of Minimum frequency! Redundancy Codes Yet in ASCII all characters are allocated the Written in 1952 same amount of space Applicable to many forms of data transmission 1 char = 1 byte, be it e or x Our example: text files still used in fax machines, mp3 encoding, others 23 24
The Basic Algorithm The Basic Algorithm 1. Scan file to be compressed and determine Any savings in tailoring codes to frequency of all values. frequency of character? 2. Sort or prioritize values based on frequency in file. Code word lengths are no longer fixed like ASCII or Unicode 3. Build Huffman code tree based on prioritized values. Code word lengths vary and will be 4. Perform a traversal of tree to determine shorter for the more frequently used new codes for values. characters 5. Scan file again to create new file using the new Huffman codes 25 26 Building a Tree Building a Tree Scan the original text Scan the original text Eerie eyes seen near lake. Consider the following short text What characters are present? Eerie eyes seen near lake. E e r i space Determine frequency of all numbers (values y s n a r l k . or in this case characters) in the text 27 28
Building a Tree Building a Tree Scan the original text Prioritize characters Eerie eyes seen near lake. Create binary tree nodes with a value What is the frequency of each character in the and the frequency for each value text? Place nodes in a priority queue Char Freq. Char Freq. Char Freq. The lower the frequency, the higher the E 1 y 1 k 1 priority in the queue e 8 s 2 . 1 r 2 n 2 i 1 a 2 space 4 l 1 29 30 Building a Tree Building a Tree While priority queue contains two or more The queue after inserting all nodes back nodes front Create new node Dequeue node and make it left subtree E i k l y . a n sp e r s Dequeue next node and make it right subtree 1 1 1 1 4 8 1 1 2 2 2 2 Frequency of new node equals sum of frequency of left and right children Enqueue new node back into queue Null Pointers are not shown 31 32
Building a Tree Building a Tree k l . sp e y a n r s E i k l y . a n sp e r s 1 1 1 1 2 2 4 8 2 2 1 1 1 1 4 8 1 1 2 2 2 2 2 i E 1 1 33 34 Building a Tree Building a Tree y . a n r s sp e k l y . a n r s sp e 2 2 1 1 2 2 2 2 4 8 1 1 1 1 2 2 2 2 4 8 E i E i 1 1 1 1 2 k l 1 1 35 36
Building a Tree Building a Tree 2 2 a n r s 2 2 sp e y . a n r s sp e 2 2 2 2 4 8 1 1 2 2 2 2 4 8 k l k l 1 1 E i E i 1 1 1 1 1 1 2 y . 1 1 37 38 Building a Tree Building a Tree 2 a n r s sp e r s 2 2 sp e 2 2 2 2 2 2 2 4 8 2 2 4 8 y . k l E i E i k l 1 1 y . 1 1 1 1 1 1 1 1 1 1 4 a n 2 2 39 40
Building a Tree Building a Tree e 4 r s e 2 2 2 2 sp sp 2 4 2 8 2 2 8 4 4 a n k l y . y . a n E i 2 2 E i k l 1 1 1 1 1 1 2 2 1 1 1 1 1 1 4 r s 2 2 41 42 Building a Tree Building a Tree e e 2 4 4 4 4 2 2 2 sp sp 8 8 4 4 a n r s a n r s k l y . y . E i 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 4 2 2 E i l k 1 1 1 1 43 44
Building a Tree Building a Tree 4 4 4 4 4 4 e 2 e sp 2 2 8 2 2 8 4 a n r s a n r s y . 2 2 2 2 2 2 2 2 1 1 E i k l E i k l 1 1 1 1 1 1 1 1 6 sp 2 4 y . 1 1 45 46 Building a Tree Building a Tree 6 4 4 4 e 4 6 e 2 8 sp 2 2 a n r s 4 2 2 8 2 2 2 2 2 sp y . 4 1 1 E i k l y . E i k l 1 1 1 1 1 1 1 1 1 1 8 4 4 a n r s 2 2 2 2 47 48
Building a Tree Building a Tree 4 8 6 8 e e 2 2 8 8 2 4 4 sp 4 4 4 10 r . E i l k 1 1 a n r s 1 1 1 1 r s a n 4 2 2 2 2 2 2 2 2 6 2 2 2 sp 4 E i k l y . 1 1 1 1 1 1 49 50 Building a Tree Building a Tree 8 10 e 8 10 4 4 4 6 16 4 2 2 2 a n r s 6 sp 2 2 2 2 4 2 2 e 8 E i k l y . 2 sp 8 1 1 1 1 1 1 4 Clicker 2 - What is happening to the values with a E i k l y . 4 4 1 1 1 1 1 1 low frequency compare to values with a high freq.? a n r s A. Small Depth B. Large Depth C. Small Height 2 2 2 2 D. Large Height E. Something else 51 52
Building a Tree Building a Tree 26 10 16 16 10 4 6 4 e e 8 8 6 8 8 2 2 2 sp 2 2 2 4 4 4 4 4 sp E i l k y . 4 1 1 1 1 1 1 E i l k y . a n r s 1 1 1 1 1 1 a n r s 2 2 2 2 2 2 2 2 53 54 Building a Tree Building a Tree Dequeue the single node After left in the queue. 26 enqueueing 16 this node 10 This tree contains the 26 there is only new code words for each 4 e 8 16 6 character. one node left 8 10 2 2 2 4 4 sp in priority 4 4 e 8 Frequency of root node E i k l y . 6 queue. 8 r s a n 1 1 1 1 1 1 should equal number of 2 2 2 2 2 2 2 4 4 sp characters in text. 4 Eerie eyes seen near lake. 4 spaces, E i k l y . 1 1 1 a n r s 1 1 1 2 2 2 2 26 characters total 55 56
Encoding the File Encoding the File Traverse Tree for Codes Traverse Tree for Codes Perform a traversal of the tree Char Code to obtain new code words E 0000 left, append a 0 to code word i 0001 right append a 1 to code word k 0010 26 26 l 0011 code word is only complete 16 y 0100 when a leaf node is reached 10 16 . 0101 10 space 011 4 e 8 6 4 8 e 10 e 8 6 8 2 2 a 1100 2 4 4 sp 2 2 2 n 1101 4 4 4 sp E i k l y . 4 r 1110 a n r s 1 1 1 E i k l y . 1 1 1 s 1111 r s 2 2 2 2 a n 1 1 1 1 1 1 2 2 2 2 57 58 Encoding the File Encoding the File Results Rescan text and encode file Have we made things any Char Code using new code words better? E 0000 000010111000011001110 Eerie eyes seen near lake. i 0001 010010111101111111010 82 bits to encode the text k 0010 110101111011011001110 l 0011 ASCII would take 8 * 26 = 000010111000011001110 011001111000010100101 y 0100 208 bits 010010111101111111010 . 0101 110101111011011001110 space 011 011001111000010100101 e 10 a 1100 If modified code used 4 bits per n 1101 character are needed. Total bits r 1110 4 * 26 = 104. Savings not as great. s 1111 59 60
Recommend
More recommend