61A Extra Lecture 4 0100010101101110011000110110111101100100011010010110111001100111 (Encoding) Thursday, February 19 1 2 What’s the point? A First Attempt • Why do we encode things? • Let’s use an encoding • You don’t speak binary Letter Binary Letter Binary • Computers don’t speak English a 0 n 1 b 1 o 0 c 0 p 1 d 1 q 1 e 1 r 0 f 0 s 1 g 0 t 0 h 1 u 0 i 1 v 1 j 1 w 1 k 0 x 1 l 1 y 0 m 1 z 0 http://pixshark.com/confused-face-clip-art.htm 3 4 3 4 Analysis Decoding Pros • Encoding by itself is useless • Encoding was easy • Decoding is also necessary • Took a very small amount of space • So… we need more bits • How many bits do we need? Cons • Decoding it was impossible • lowercase alphabet • 5 bits 5 6 5 6 A Second Attempt Analysis • Let’s try another encoding Pros • Encoding was easy Letter Binary Letter Binary • Decoding was possible a 00000 n 01101 b 00001 o 01110 Cons c 00010 p 01111 d 00011 q 10000 • Takes more space… e 00100 r 10001 • What restriction did we place that’s unnecessary? f 00101 s 10010 g 00110 t 10011 • Fixed length h 00111 u 10100 i 01000 v 10101 j 01001 w 10110 k 01010 x 10111 l 01011 y 11000 m 01100 z 11001 7 8 7 8
Variable Length Encoding A Second Look at Fixed Length • Problems? Letter Binary Letter Binary • When do we start and stop? a 00000 n 01101 b 00001 o 01110 • String of As and Bs: ABA c 00010 p 01111 d 00011 q 10000 • A - 00, B - 0 e 00100 r 10001 • Encode ABA: 00000 f 00101 s 10010 g 00110 t 10011 • Decode 00000: h 00111 u 10100 i 01000 v 10101 • ABA, AAB, BAA? j 01001 w 10110 k 01010 x 10111 • What lengths do we use? l 01011 y 11000 m 01100 z 11001 9 10 9 10 Trees! What happens when … ? • Rule 1: Each leaf only has 1 label 0 1 C 0 1 Letter Binary Letter Binary a 0 n 1 A B b 1 o 0 c 0 p 1 d 1 q 1 e 1 r 0 Letter Binary f 0 s 1 A 00 g 0 t 0 h 1 u 0 B 01 i 1 v 1 C 1 j 1 w 1 k 0 x 1 l 1 y 0 m 1 z 0 11 12 11 12 What happens when … ? An Optimal Encoding • Rule 2: Only leaves get labels • Start with a tree • What kinds of things do we want to encode with this? Letter Binary • What letter do we want to appear the most? • How about the least? A 00 • This is called a Huffman Encoding 0 1 B 0 C 0 1 A B 13 14 13 14 Huffman Encoding Huffman Encoding • Let’s pretend we want to come up with the optimal encoding: • Start with the two smallest frequencies • AAAAAAAAAABBBBBCCCCCCCDDDDDDDDD • A appears 10 times, B appears 5 times, C appears 7 times, D appears 9 times • A appears 10 times • B appears 5 times C 0 1 • C appears 7 times B C • D appears 9 times B D A D A 15 16 15 16
Huffman Encoding Huffman Encoding • Continue… • And finally… • A appears 10 times, B & C appear a combined 12 times, D appears 9 times 0 1 0 1 0 1 0 1 B C B C B C 0 1 0 1 B C A D 0 1 D 0 1 A A D A D 17 18 17 18 Huffman Encoding Huffman Encoding • Another example… • Start with the two smallest frequencies • AAAAAAAAAABCCD • A appears 10 times, B appears 1 time, C appears 2 times, D appears 1 time • A appears 10 times • B appears 1 time C 0 1 • C appears 2 times B D • D appears 1 time B D A C A 19 20 19 20 Huffman Encoding Huffman Encoding • Start with the two smallest frequencies • And finally… • A appears 10 times, B & D appear a combined 2 times, C appears 2 times 0 1 0 1 0 1 0 1 C A B D 0 1 C 0 1 0 1 B D C B D 0 1 C B D A A A 21 22 21 22
Recommend
More recommend