Chapter 5 Data Compression Peng-Hua Wang Graduate Inst. of Comm. - PowerPoint PPT Presentation

Chapter 5 Data Compression Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University

Chapter Outline Chap. 5 Data Compression 5.1 Example of Codes 5.2 Kraft Inequality 5.3 Optimal Codes 5.4 Bound on Optimal Code Length 5.5 Kraft Inequality for Uniquely Decodable Codes 5.6 Huffman Codes 5.7 Some Comments on Huffman Codes 5.8 Optimality of Huffman Codes 5.9 Shannon-Fano-Elias Coding 5.10 Competitive Optimality of the Shannon Code 5.11 Generation of Discrete Distributions from Fair Coins Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 2/41

5.1 Example of Codes Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 3/41

Source code Definition (Source code) A source code C for a random variable X is a mapping from X , the range of X , to D ∗ , the set of finite-length strings of symbols from a D -ary alphabet. Let C ( x ) denote the codeword corresponding to x and let l ( x ) denote the length of C ( x ) . ■ For example, C ( red ) = 00 , C ( blue ) = 11 is a source code with mapping from X = { red , blue } to D 2 with alphabet D = { 0 , 1 } . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 4/41

Source code Definition (Expected length) The expected length L ( C ) of a source code C ( x ) for a random variable X with probability mass function p ( x ) is given by � L ( C ) = p ( x ) l ( x ) . x ∈X where l ( X ) is the length of the codeword associated with X . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 5/41

Example Example 5.1.1 Let X be a random variable with the following distribution and codeword assignment Pr { X = 1 } = 1 2 , codeword C (1) = 0 Pr { X = 2 } = 1 4 , codeword C (2) = 10 Pr { X = 3 } = 1 8 , codeword C (3) = 110 Pr { X = 4 } = 1 8 , codeword C (4) = 111 ■ H ( X ) = 1 . 75 bits. ■ El ( x ) = 1 . 75 bits. ■ uniquely decodable Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 6/41

Example Example 5.1.2 Consider following example. Pr { X = 1 } = 1 3 , codeword C (1) = 0 Pr { X = 2 } = 1 3 , codeword C (2) = 10 Pr { X = 3 } = 1 3 , codeword C (3) = 11 ■ H ( X ) = 1 . 58 bits. ■ El ( x ) = 1 . 66 bits. ■ uniquely decodable Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 7/41

Source code Definition (non-singular) A code is said to be nonsingular if every element of the range of X maps into a different string in D ∗ ; that is, x � = x ′ ⇒ C ( x ) � = C ( x ′ ) Definition (extension code) The extension C ∗ of a code C is the mapping from finite length-strings of X to finite-length strings of D , defined by C ( x 1 x 2 · · · x n ) = C ( x 1 ) C ( x 2 ) · · · C ( x n ) where C ( x 1 ) C ( x 2 ) · · · C ( x n ) indicates concatenation of the corresponding codewords. Example 5.1.4 If C ( x 1 ) = 00 and C ( x 2 ) = 11 , then C ( x 1 x 2 ) = 0011 . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 8/41

Source code Definition (uniquely decodable) A code is called uniquely decodable if its extension is nonsingular. Definition (prefix code) A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other codeword. ■ For an instantaneous code, the symbol x i can be decoded as soon as we come to the end of the codeword corresponding to it. ■ For example, the binary string 01011111010 produced by the code of Example 5.1.1 is parsed as 0 , 10 , 111 , 110 , 10 . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 9/41

Source code X Singular Nonsingular, but not UD, UD But Not Inst. Inst. 1 0 0 10 0 2 0 010 00 10 3 0 01 11 110 4 0 10 110 111 Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 10/41

Decoding Tree Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 11/41

5.2 Kraft Inequality Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 12/41

Kraft Inequality Theorem 5.2.1 (Kraft Inequality) For any instantaneous code (prefix code) over an alphabet of size D , the codeword lengths l 1 , l 2 , . . . , l m must satisfy the inequality D − l i ≤ 1 . � i Conversely, given a set of codeword lengths that satisfy this inequality, there exists an instantaneous code with these word lengths. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 13/41

Extended Kraft Inequality Theorem 5.2.2 (Extended Kraft Inequality) For any countably infinite set of codewords that form a prefix code, the codeword lengths satisfy the extended Kraft inequality, ∞ D − l i ≤ 1 . � i =1 Conversely, given any l 1 , l 2 , . . . satisfying the extended Kraft inequality, we can construct a prefix code with these codeword lengths. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 14/41

5.3 Optimal Codes Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 15/41

Minimize expected length Problem Given the source pmf pP 1 , p 2 , . . . p m , find the code length l 1 , l 2 , . . . , l m such that the expected code length is minimized � L = p i l i with constraint D − l i ≤ 1 . � ■ l 1 , l 2 , . . . , l m are integers. ■ We first relax the original integer programming problem. The restriction of integer length is relaxed to real number. ■ Solve by Lagrange multipliers. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 16/41

Solve the relaxed problem ∂J �� D − l i � = p i − λD − l i ln D � J = p i l i + λ , ∂l i ∂J p i = 0 ⇒ D − l i = ∂l i λ ln D 1 D − l i ≤ 1 ⇒ λ = � ln D ⇒ p i = D − l i ⇒ optimal code length l ∗ i = − log D p i � The expected code length is L ∗ = � � p i l ∗ i = − p i log D p i = H D ( X ) Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 17/41

Expected code length Theorem 5.3.1 The expected length L of any instantaneous D -ary code for a random variable X is greater than or equal to the entropy H D ( X ) ; that is, L ≥ H D ( X ) with equality if and only if D − l i = p i . Proof. � � L − H D ( X ) = p i l i + p i log D p i p i log D D − l i + � � = − p i log D p i = D ( p || q ) ≥ 0 where q i = D − l i . WRONG!! Because � D − l i ≤ 1 may not be a valid distribution. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 18/41

Expected code length Proof. � � L − H D ( X ) = p i l i + p i log D p i p i log D D − l i + � � = − p i log D p i j D − l j , r i = D − l i /c, Let c = � p i � L − H D ( X ) = p i log D − log D c r i 1 = D ( p || r ) + log D c ≥ 0 since D ( p || r ) ≥ 0 and c ≤ 1 by Kraft inequality. L ≤ H D ( X ) with equality iff p i = D − l i . That is, iff − log D p i is an integer for all i . � Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 19/41

D-adic Definition ( D -adic) A probability distribution is called D -adic if each of the probabilities is equal to D − n for some n . ■ L = H D ( X ) if and only if the distribution of X is D -adic. ■ How to find the optimal code? ⇒ Find the D -adic distribution that is closest (in the relative entropy sense) to the distribution of X . ■ What is the upper bound of the optimal code ? Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 20/41

5.4 Bound on Optimal Code Length Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 21/41

Optimal code length Theorem 5.4.1 Let l ∗ 1 , l ∗ 2 , . . . , l ∗ m be optimal codeword lengths for a source distribution p and sa D -ary alphabet, and let L ∗ be the associated expected length of an optimal code ( L | ast = � p i l ∗ i ). Then H D ( X ) ≤ L ∗ < H D ( X ) + 1 . 1 Proof. Let l i = ⌈ log D p i ⌉ where ⌈ x ⌉ is the smallest integer ≥ x . These lengths satisfy the Kraft inequality since pi ⌉ ≤ D − log D 1 1 D −⌈ log D � � pi = p i = 1 . These choice of codeword lengths satisfies 1 1 log D ≤ l i ≤ log D + 1 . p i p i Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 22/41

Optimal code length Multiplying by p i and summing over i , we obtain H D ( X ) ≤ L < H D ( X ) + 1 . Since L ∗ is the expected length of the optimal code, L ∗ ≤ L < H D ( X ) + 1 . On another hand, from Theorem 5.3.1, L ∗ ≥ H D ( X ) . Therefore, H D ( X ) ≤ L ∗ < H D ( X ) + 1 . � Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 23/41

Optimal code length Consider a system in which we send a sequence of n symbols from X . Define L n to be the expected codeword length per input symbol, L n = 1 � p ( x 1 , x 2 , . . . , x n ) l ( x 1 , x 2 , . . . , x n ) n = 1 nE [ l ( X 1 , X 2 , . . . , X n )] We have H ( X 1 , X 2 , . . . , X n ) ≤ E [ l ( X 1 , X 2 , . . . , X n )] < H ( X 1 , X 2 , . . . , X n ) + 1 If X 1 , X 2 , . . . , X n are i.i.d., we have H ( X ) ≤ L n < H ( X ) + 1 n Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 24/41

Optimal code length Consider a system in which we send a sequence of n symbols from X . Define L n to be the expected codeword length per input symbol, L n = 1 � p ( x 1 , x 2 , . . . , x n ) l ( x 1 , x 2 , . . . , x n ) n = 1 nE [ l ( X 1 , X 2 , . . . , X n )] We have H ( X 1 , X 2 , . . . , X n ) ≤ E [ l ( X 1 , X 2 , . . . , X n )] < H ( X 1 , X 2 , . . . , X n ) + 1 Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 5 - p. 25/41

Chapter 5 Data Compression Peng-Hua Wang Graduate Inst. of Comm. - PowerPoint PPT Presentation

Chapter 5 Data Compression Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 5 Data Compression 5.1 Example of Codes 5.2 Kraft Inequality 5.3 Optimal Codes 5.4 Bound on Optimal Code Length 5.5

Lossless compression in lossy compression systems Almost every lossy compression system

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

Chapter 03 and Unicode character sets. Explain data compression and calculate compression

Compression Overview Multimedia Encoding and Compression Huffman codes Lossless

Scientific Data Compression: From Stone-Age to Renaissance Factor 10,100 compression

Data Compression Reduce the size of data. Reduces storage space and hence storage cost.

Data compression anhtt-fit@mail.hut.edu.vn dungct@it-hut.edu.vn Data Compression Data in memory

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

A Model to Address Salary Compression for Faculty (an anti-compression model) Presented to

Compression Programs File Compression: Gzip, Bzip Archivers :Arc, Pkzip, Winrar,

IDEALEM Implementation of Dynamic Extensible Adaptive Locally Exchangeable Measures Scientific

Improving I/O Forwarding Throughput with Data Compression Presented by Benjamin Welton

15 Data Compression Foundations of Computer Science Cengage Learning 15.1 Objectives

MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro More

Online Data Quality Monitoring in the PXD Peter Kodys and Peter Kvasnicka Charles University,

Recent Plan 9 Work at Bell Labs Geoff Collyer geoff@plan9.belllabs.com Fifth International

The Data Encryption Standard in Detail Cunsheng Ding Department of Computer Science Hong Kong

Data Encryption Standard Simplified-DES Details of DES DES in OpenSSL Cryptography DES in

Sambuz

Useful Links

Newsletter

Mail Us