1896 1920 1987 2006 Computing and Communications 2. Information Theory -Data Compression Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1
Outline • Examples of codes • Kraft inequality for instantaneous codes • Kraft inequality for uniquely decodable codes • Optimal codes • Huffman codes 2
Reference • Elements of information theory, T. M. Cover and J. A. Thomas, Wiley 3
EXAMPLES OF CODES 4
Source Code • D-ary alphabet {0, 1, …, D-1} 5
Examples H(X) = 1.75 bits L(C) = 1.75 bits H(X)= L(X) H(X) = 1.58 bits L(C) = 1.66 bits H(X)< L(X) 6
Conditions on Codes • Guarantee decodability of a single value of X ? • Describe a sequence of values of X ? – example: if C(x 1 )=00 and C(x 2 )=11 , then C(x 1 x 2 )=0011 7
Conditions on Codes • Guarantee decodability of a sequence of values of X w/o adding a special symbol between any two codewords? • Guarantee decodability of a sequence of values of X w/o reference to future codewords? – end of a codeword is immediately recognizable – an instantaneous code is a self-punctuating code – example: codewords C(1) = 0 , C(2) = 10 , C(3) = 110 , C(4) = 111 , binary string 01011111010 is parsed as 0, 10, 111, 110, 10 8
Classes of Codes decodability of a single value of X decodability of a sequence of values of X w/o adding a special symbol between any two codewords decodability of a sequence of values of X w/o reference to future codewords 9
Example – code 1: source of 0 can be 1, 2, 3, 4 – code 2: source sequence of 010 can be 2, 14, 31 – code 3: source sequence of 11…can be 3 (following bit is 1), 4 (following bits are 0’s of odd number), 3 (following bits are 0’s of even number) – code 4: prefix free 10
KRAFT INEQUALITY FOR INSTANTANEOUS CODES 11
Kraft Inequality • Wish to construct instantaneous codes of minimum expected length to describe a given source – cannot assign short codewords to all source symbols and still be prefix-free 12
Idea of Proof Consider a D -ary tree in which each node has D children. Let the branches • of the tree represent the symbols of the codeword. Each codeword is represented by a leaf on the tree. The path from the root traces out the symbols of the codeword. The prefix condition on the codewords implies that no codeword is an ancestor of any other codeword on the tree. 0 1 0 1 0 1 13
KRAFT INEQUALITY FOR UNIQUELY DECODABLE CODES 14
McMillan Inequality • Expect uniquely decodable codes to offer further possibilities for the set of codeword lengths than instantaneous codes – class of uniquely decodable codes is larger i – NO! 15
OPTIMAL CODES 16
Expected Code Length Minimization integer programming convex optimization continuous linear function relaxation larger feasible set convex lower minimum value function near optimal optimal solution solution rounding up 17
Expected Length 18
Another Proof 19
Minimum Expected Length – there is an overhead of at most 1 bit due to the non- integer case – reduce the overhead per symbol by spreading it out over many symbols 20
Minimum Expected Length per Symbol • Send a sequence of n symbols from X as a super symbol with expected length another justification for entropy rate: expected number of bits per symbol required to describe the process – i.i.d. case: entropy rate = H(x) 21
HUFFMAN CODES 22
Huffman Algorithm 23
Example 24
Example 25
Optimality of Huffman Code 26
History of Huffman Code In 1951, David A. Huffman (at the edge of 26) and his MIT information theory classmates were given the choice of a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient. In doing so, Huffman outdid Fano, who had worked with information theory inventor Claude Shannon to develop a similar code. Building the tree from the bottom up guaranteed optimality, unlike top-down Shannon-Fano coding. 27
Summary 28
Summary 29
Recommend
More recommend