Average Redundancy of the Shannon Code for Markov Sources Neri Merhav and Wojciech Szpankowski Technion and Purdue University May 27, 2013 NSF STC Center for Science of Information AofA, Menorca 2013 Dedicated to PHILIPPE FLAJOLET
Outline 1. Source Coding 2. Redundancy: Known Sources 3. Shannon and Huffman Redundancy for Memoryless Sources 4. Shannon Coding Redundancy for Markov Sources
Source Coding A source code is a bijective mapping C : A ∗ → { 0 , 1 } ∗ from sequences over the alphabet A to set { 0 , 1 } ∗ of binary sequences. The basic problem of source coding (i.e., data compression ) is to find codes with shortest descriptions (lengths) either on average or for individual sequences . Three Basic Types of Source Coding : • Fixed-to-Variable ( FV ) length codes (e.g., Huffman and Shannon codes). • Variable-to-Fixed ( VF ) length codes (e.g., Tunstall and Khodak codes). • Variable-to-Variable ( VV ) length codes (e.g., Khodak VV code).
Prefix Codes Prefix code is such that no codeword is a prefix of another codeword. We write: P ( x ) be the probability of x ∈ A ∗ ; L ( C, x ) be the code length for the source sequence x ∈ A ∗ ; H ( P ) = − � x ∈A∗ P ( x ) lg P ( x ) for entropy. Kraft’s Inequality A binary code is a prefix code iff the code lengths ℓ 1 , ℓ 2 , . . . , ℓ N satisfy l max l i l max ∑ – < l i 2 2 i � N i =1 2 − ℓi ≤ 1 . l max l i – Shannon First Theorem For any prefix code the average code length E [ L ( C, X )] cannot be smaller than the entropy of the source H ( P ) , that is, E [ L ( C n , X )] ≥ H ( P ) . x n x n There exists at least one sequence ˜ 1 such that L (˜ 1 ) ≥ Exercise : x n − log 2 P (˜ 1 ) .
Redundancy Known Source P . The pointwise redundancy R ( x ) and the average redundancy ¯ R : R ( x ) = L ( C, x ) + lg P ( x ) ¯ R = E [ L ( C, X )] − H ( P ) ≥ 0 Optimal Code : 2 − L ( x ) ≤ 1 . � � min L ( x ) P ( x ) subject to L x x Solution: By Lagrangian multipliers we find L opt ( x ) = − lg P ( x ) . The smaller the redundancy is, the better (closer to the optimal) the code is.
Outline Update 1. Source Coding 2. Redundancy: Known Sources 3. Shannon and Huffman Redundancy for Memoryless Sources 4. Shannon Coding Redundancy for Markov Sources
Redundancy for Huffman’s Code We consider fixed-to-variable length codes : Shannon & Huffman codes. For a known source P , we consider fixed length sequences x n 1 = x 1 . . . x n . Huffman Code : The following optimization problem 1 [ L ( C n , x n 1 ) + log 2 P ( x n ¯ R n = min 1 )] . Cn ∈C E xn is solved by Huffman’s code. We review first the average redundancy for a binary memoryless sources with p denoting the probability of generating “ 0 ” and q = 1 − p . In 1994 Stubley proposed the following for Huffman’s average redundancy n n � n � n p k q n − k 2 −� αk + βn � + o (1) . � � R H ¯ � p k q n − k � αk + βn � − 2 � n = 2 − k k k =0 k =0 where � 1 − p 1 � � � α = log 2 , β = log 2 p 1 − p and � x � = x − ⌊ x ⌋ is the fractional part of x .
Main Result Theorem 1 (W.S., 2000) . Consider the Huffman block code of length n over a binary memoryless source with p < 1 2 . Then as n → ∞ 3 1 2 − ln 2 + o (1) ≈ 0 . 057304 α irrational R H ¯ n = M (1 − 2 − 1 /M ) 2 −� nβM � /M + O ( ρ n ) 2 − 1 3 � βMn � − 1 1 α = N � � − M 2 M where N, M are integers such that gcd( N, M ) = 1 and ρ < 1 . 0.08 0.08 0.07 0.06 0.06 0.04 0.05 0.02 0.04 0.03 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 (a) (b) Figure 1: The average redundancy of Huffman codes versus block size n for: (a) irrational α = log 2 (1 − p ) /p with p = 1 /π ; (b) rational α = log 2 (1 − p ) /p with p = 1 / 9 .
Why Two Modes: Shannon Code Consider the Shannon code that assigns the length L ( C S n , x n 1 ) = ⌈− lg P ( x n 1 ) ⌉ to the source sequence x n 1 . Observe that P ( x n 1 ) = p k (1 − p ) n − k where p is known probability of generating 0 and k is the number of 0 s. The Shannon code redundancy is n � n � p k (1 − p ) n − k � � R S � ⌈− log 2 ( p k (1 − p ) n − k ) ⌉ + log 2 ( p k (1 − p ) n − k ) ¯ = n k k =0 n � n � � p k (1 − p ) n − k � αk + βn � = 1 − k k =0 where � x � = x − ⌊ x ⌋ is the fractional part of x , and � 1 − p 1 � � � α = log 2 , β = log 2 . p 1 − p
Sketch of Proof We need to understand asymptotic behavior of the following sum (cf. Bernoulli distributed sequences modulo 1 ) n � n � � p k (1 − p ) n − k f ( � αk + y � ) k k =0 for fixed p and some Riemann integrable function f : [0 , 1] → R . Lemma 1. Let 0 < p < 1 be a fixed real number and α be an irrational number. Then for every Riemann integrable function f : [0 , 1] → R � 1 n � n � � p k (1 − p ) n − k f ( � αk + y � ) = lim f ( t ) dt, k n →∞ 0 k =0 where the convergence is uniform for all shifts y ∈ R . Lemma 2. Let α = N M be a rational number with gcd( N, M ) = 1 . Then for bounded function f : [0 , 1] → R � l n M − 1 � n p k (1 − p ) n − k f ( � αk + y � ) = 1 M + � My � � � � � + O ( ρ n ) f k M M k =0 l =0 uniformly for all y ∈ R and some ρ < 1 .
Shannon Redundancy – Rational Case � n p k q n − k . � Assume α = N/M where gcd( N, M ) = 1 . Denote p n,k = k n M − 1 � n � k N � � ℓ N � � � p k q n − k � � S n = M + βn = p n,k M + N + βn k k =0 ℓ =0 m : k = ℓ + mM ≤ n � ℓ M − 1 � � � = M + βn p n,k . ℓ =0 m : k = ℓ + mM ≤ n Lemma 3. For fixed ℓ ≤ M and M , there exist ρ < 1 such that � n p k (1 − p ) n − k = 1 � � M + O ( ρ n ) . k m : k = ℓ + mM ≤ n Proof . Let ω k = e 2 πik/M for k = 0 , 1 , . . . , M − 1 be the M th root of unity. � 1 M − 1 1 if M | n � ω n k = 0 otherwise . M k =0 where M | n means that M divides n . Then p k q n − k = 1 + ( pω 1 + q ) n − ℓ + .. + ( pω M − 1 + q ) n − ℓ � n = 1 � � M + O ( ρ n ) , k M m : k = ℓ + mM ≤ n since | ( pω r + q ) | = p 2 + q 2 + 2 pq cos(2 πr/M ) < 1 for r � = 0 .
Finishing the Rational Case We shall use the following Fourier series; for real x ∞ � x � = 1 sin 2 πmx = 1 i c m e 2 πimx , c m = − � � 2 − 2 − 2 πm, mπ m =1 m ∈ Z −{ 0 } Continuing the derivation and using the above lemma we obtain M − 1 M − 1 1 1 = 1 c m e 2 πimnβ 1 e 2 πim ℓ c m e 2 πim ( ℓ/M + βn ) � � � � S n = 2 − 2 − M M M ℓ =0 m � =0 m � =0 ℓ =0 1 2 − 1 c kM e 2 πikMβn = 1 2 − 1 � 1 � � = 2 − � βnM � . M M m = kM � =0
Outline Update 1. Source Coding 2. Redundancy: Known Sources 3. Shannon and Huffman Redundancy for Memoryless Sources 4. Shannon Coding Redundancy for Markov Sources
Markov Sources Source sequence X 1 , X 2 , . . . , over alphabet A = { 1 , 2 , . . . , r } is generated by a first–order Markov chain with a given matrix P = { p ( j | k ) } r j,k =1 . with initial state probabilities p k , k = 1 , 2 , . . . , r ; stationary state probabilities π k , k = 1 , 2 , . . . , r . For x n = ( x 1 , . . . , x n ) ∈ A n under the given Markov source, is n P ( x n ) = p x 1 � p ( x t | x t − 1 ) . t =2 The average redundancy of the Shannon code is defined as R n = E [ ⌈− log P ( X n ) ⌉ + log P ( X n )] = E [ ̺ ( − log P ( X n ))] . ̺ ( u ) = ⌈ u ⌉ − u.
Main Result for Markov Sources Theorem 2 (Merhav & W.S.) . Consider the Shannon code of length n for an aperiodic and irreducible Markov source. Define � p ( j | 1) p ( j | j ) � α jk = log , j, k ∈ { 1 , 2 , . . . , r } . p ( k | 1) p ( j | k ) (a) If not all { α jk } are rational, then R n = 1 2 + o (1) . (b) If all { α jk } are rational, then let ζ jk ( n ) = M [ − ( n − 1) log p (1 | 1) + log p ( j | 1) − log p ( k | 1) − log p j ] , and r r Ω n = 1 1 − 1 + 1 � � � � p j π k ̺ [ ζ jk ( n )] , 2 M M j =1 k =1 M is the smallest common multiple of the denominators of { α jk } . Then, there exists a positive sequence ξ n → 0 r r Ω n + 1 � � R n ≤ p j π k I{ ̺ [ ζ jk ( n )] / ∈ ( ξ n , 1 − ξ n ) } + o (1) , M j =1 k =1 r r Ω n − 1 � � R n ≥ p j π k I{ ̺ [ ζ jk ( n )] / ∈ ( ξ n , 1 − ξ n ) } − o (1) . M j =1 k =1
Sketch of Proof 1 . We note that ρ ( u ) has the following Fourier series expansion ̺ ( u ) = 1 1 � a m e 2 πimu , 2 + a m = 2 πm m � =0 where a m · k = a m /k for integers m, k . 2 . Since R n = E [ ̺ (log P ( X n ))] (for aperiodic irreducible MC) we have R n = 1 a m E [ e − 2 πim log P ( Xn ) ] . � 2 + m � =0 which we can re-write as n R n = 1 � � � 2 + a m p ( x t | x t − 1 ) exp [ − 2 πim log p ( x t | x t − 1 )] x ∈A n t =1 m � =0 � n since P ( x n ) = p x 1 t =2 p ( x t | x t − 1 ) .
Recommend
More recommend