computing and communications 2 information theory data
play

Computing and Communications 2. Information Theory -Data - PowerPoint PPT Presentation

1896 1920 1987 2006 Computing and Communications 2. Information Theory -Data Compression Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Examples of codes Kraft inequality


  1. 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Data Compression Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1

  2. Outline • Examples of codes • Kraft inequality for instantaneous codes • Kraft inequality for uniquely decodable codes • Optimal codes • Huffman codes 2

  3. Reference • Elements of information theory, T. M. Cover and J. A. Thomas, Wiley 3

  4. EXAMPLES OF CODES 4

  5. Source Code • D-ary alphabet {0, 1, …, D-1} 5

  6. Examples H(X) = 1.75 bits L(C) = 1.75 bits H(X)= L(X) H(X) = 1.58 bits L(C) = 1.66 bits H(X)< L(X) 6

  7. Conditions on Codes • Guarantee decodability of a single value of X ? • Describe a sequence of values of X ? – example: if C(x 1 )=00 and C(x 2 )=11 , then C(x 1 x 2 )=0011 7

  8. Conditions on Codes • Guarantee decodability of a sequence of values of X w/o adding a special symbol between any two codewords? • Guarantee decodability of a sequence of values of X w/o reference to future codewords? – end of a codeword is immediately recognizable – an instantaneous code is a self-punctuating code – example: codewords C(1) = 0 , C(2) = 10 , C(3) = 110 , C(4) = 111 , binary string 01011111010 is parsed as 0, 10, 111, 110, 10 8

  9. Classes of Codes decodability of a single value of X decodability of a sequence of values of X w/o adding a special symbol between any two codewords decodability of a sequence of values of X w/o reference to future codewords 9

  10. Example – code 1: source of 0 can be 1, 2, 3, 4 – code 2: source sequence of 010 can be 2, 14, 31 – code 3: source sequence of 11…can be 3 (following bit is 1), 4 (following bits are 0’s of odd number), 3 (following bits are 0’s of even number) – code 4: prefix free 10

  11. KRAFT INEQUALITY FOR INSTANTANEOUS CODES 11

  12. Kraft Inequality • Wish to construct instantaneous codes of minimum expected length to describe a given source – cannot assign short codewords to all source symbols and still be prefix-free 12

  13. Idea of Proof Consider a D -ary tree in which each node has D children. Let the branches • of the tree represent the symbols of the codeword. Each codeword is represented by a leaf on the tree. The path from the root traces out the symbols of the codeword. The prefix condition on the codewords implies that no codeword is an ancestor of any other codeword on the tree. 0 1 0 1 0 1 13

  14. KRAFT INEQUALITY FOR UNIQUELY DECODABLE CODES 14

  15. McMillan Inequality • Expect uniquely decodable codes to offer further possibilities for the set of codeword lengths than instantaneous codes – class of uniquely decodable codes is larger i – NO! 15

  16. OPTIMAL CODES 16

  17. Expected Code Length Minimization integer programming convex optimization continuous linear function relaxation larger feasible set convex lower minimum value function near optimal optimal solution solution rounding up 17

  18. Expected Length 18

  19. Another Proof 19

  20. Minimum Expected Length – there is an overhead of at most 1 bit due to the non- integer case – reduce the overhead per symbol by spreading it out over many symbols 20

  21. Minimum Expected Length per Symbol • Send a sequence of n symbols from X as a super symbol with expected length another justification for entropy rate: expected number of bits per symbol required to describe the process – i.i.d. case: entropy rate = H(x) 21

  22. HUFFMAN CODES 22

  23. Huffman Algorithm 23

  24. Example 24

  25. Example 25

  26. Optimality of Huffman Code 26

  27. History of Huffman Code In 1951, David A. Huffman (at the edge of 26) and his MIT information theory classmates were given the choice of a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient. In doing so, Huffman outdid Fano, who had worked with information theory inventor Claude Shannon to develop a similar code. Building the tree from the bottom up guaranteed optimality, unlike top-down Shannon-Fano coding. 27

  28. Summary 28

  29. Summary 29

Recommend


More recommend