Average Redundancy of the Shannon Code for Markov Sources Neri - PowerPoint PPT Presentation

Average Redundancy of the Shannon Code for Markov Sources Neri Merhav and Wojciech Szpankowski Technion and Purdue University May 27, 2013 NSF STC Center for Science of Information AofA, Menorca 2013 Dedicated to PHILIPPE FLAJOLET

Outline 1. Source Coding 2. Redundancy: Known Sources 3. Shannon and Huffman Redundancy for Memoryless Sources 4. Shannon Coding Redundancy for Markov Sources

Source Coding A source code is a bijective mapping C : A ∗ → { 0 , 1 } ∗ from sequences over the alphabet A to set { 0 , 1 } ∗ of binary sequences. The basic problem of source coding (i.e., data compression ) is to find codes with shortest descriptions (lengths) either on average or for individual sequences . Three Basic Types of Source Coding : • Fixed-to-Variable ( FV ) length codes (e.g., Huffman and Shannon codes). • Variable-to-Fixed ( VF ) length codes (e.g., Tunstall and Khodak codes). • Variable-to-Variable ( VV ) length codes (e.g., Khodak VV code).

Prefix Codes Prefix code is such that no codeword is a prefix of another codeword. We write: P ( x ) be the probability of x ∈ A ∗ ; L ( C, x ) be the code length for the source sequence x ∈ A ∗ ; H ( P ) = − � x ∈A∗ P ( x ) lg P ( x ) for entropy. Kraft’s Inequality A binary code is a prefix code iff the code lengths ℓ 1 , ℓ 2 , . . . , ℓ N satisfy l max l i l max ∑ – < l i 2 2 i � N i =1 2 − ℓi ≤ 1 . l max l i – Shannon First Theorem For any prefix code the average code length E [ L ( C, X )] cannot be smaller than the entropy of the source H ( P ) , that is, E [ L ( C n , X )] ≥ H ( P ) . x n x n There exists at least one sequence ˜ 1 such that L (˜ 1 ) ≥ Exercise : x n − log 2 P (˜ 1 ) .

Redundancy Known Source P . The pointwise redundancy R ( x ) and the average redundancy ¯ R : R ( x ) = L ( C, x ) + lg P ( x ) ¯ R = E [ L ( C, X )] − H ( P ) ≥ 0 Optimal Code : 2 − L ( x ) ≤ 1 . � � min L ( x ) P ( x ) subject to L x x Solution: By Lagrangian multipliers we find L opt ( x ) = − lg P ( x ) . The smaller the redundancy is, the better (closer to the optimal) the code is.

Outline Update 1. Source Coding 2. Redundancy: Known Sources 3. Shannon and Huffman Redundancy for Memoryless Sources 4. Shannon Coding Redundancy for Markov Sources

Redundancy for Huffman’s Code We consider fixed-to-variable length codes : Shannon & Huffman codes. For a known source P , we consider fixed length sequences x n 1 = x 1 . . . x n . Huffman Code : The following optimization problem 1 [ L ( C n , x n 1 ) + log 2 P ( x n ¯ R n = min 1 )] . Cn ∈C E xn is solved by Huffman’s code. We review first the average redundancy for a binary memoryless sources with p denoting the probability of generating “ 0 ” and q = 1 − p . In 1994 Stubley proposed the following for Huffman’s average redundancy n n � n � n p k q n − k 2 −� αk + βn � + o (1) . � � R H ¯ � p k q n − k � αk + βn � − 2 � n = 2 − k k k =0 k =0 where � 1 − p 1 � � � α = log 2 , β = log 2 p 1 − p and � x � = x − ⌊ x ⌋ is the fractional part of x .

Main Result Theorem 1 (W.S., 2000) . Consider the Huffman block code of length n over a binary memoryless source with p < 1 2 . Then as n → ∞  3 1 2 − ln 2 + o (1) ≈ 0 . 057304 α irrational   R H ¯ n = M (1 − 2 − 1 /M ) 2 −� nβM � /M + O ( ρ n ) 2 − 1 3 � βMn � − 1 1 α = N � � −   M 2 M where N, M are integers such that gcd( N, M ) = 1 and ρ < 1 . 0.08 0.08 0.07 0.06 0.06 0.04 0.05 0.02 0.04 0.03 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 (a) (b) Figure 1: The average redundancy of Huffman codes versus block size n for: (a) irrational α = log 2 (1 − p ) /p with p = 1 /π ; (b) rational α = log 2 (1 − p ) /p with p = 1 / 9 .

Why Two Modes: Shannon Code Consider the Shannon code that assigns the length L ( C S n , x n 1 ) = ⌈− lg P ( x n 1 ) ⌉ to the source sequence x n 1 . Observe that P ( x n 1 ) = p k (1 − p ) n − k where p is known probability of generating 0 and k is the number of 0 s. The Shannon code redundancy is n � n � p k (1 − p ) n − k � � R S � ⌈− log 2 ( p k (1 − p ) n − k ) ⌉ + log 2 ( p k (1 − p ) n − k ) ¯ = n k k =0 n � n � � p k (1 − p ) n − k � αk + βn � = 1 − k k =0 where � x � = x − ⌊ x ⌋ is the fractional part of x , and � 1 − p 1 � � � α = log 2 , β = log 2 . p 1 − p

Sketch of Proof We need to understand asymptotic behavior of the following sum (cf. Bernoulli distributed sequences modulo 1 ) n � n � � p k (1 − p ) n − k f ( � αk + y � ) k k =0 for fixed p and some Riemann integrable function f : [0 , 1] → R . Lemma 1. Let 0 < p < 1 be a fixed real number and α be an irrational number. Then for every Riemann integrable function f : [0 , 1] → R � 1 n � n � � p k (1 − p ) n − k f ( � αk + y � ) = lim f ( t ) dt, k n →∞ 0 k =0 where the convergence is uniform for all shifts y ∈ R . Lemma 2. Let α = N M be a rational number with gcd( N, M ) = 1 . Then for bounded function f : [0 , 1] → R � l n M − 1 � n p k (1 − p ) n − k f ( � αk + y � ) = 1 M + � My � � � � � + O ( ρ n ) f k M M k =0 l =0 uniformly for all y ∈ R and some ρ < 1 .

Shannon Redundancy – Rational Case � n p k q n − k . � Assume α = N/M where gcd( N, M ) = 1 . Denote p n,k = k n M − 1 � n � k N � � ℓ N � � � p k q n − k � � S n = M + βn = p n,k M + N + βn k k =0 ℓ =0 m : k = ℓ + mM ≤ n � ℓ M − 1 � � � = M + βn p n,k . ℓ =0 m : k = ℓ + mM ≤ n Lemma 3. For fixed ℓ ≤ M and M , there exist ρ < 1 such that � n p k (1 − p ) n − k = 1 � � M + O ( ρ n ) . k m : k = ℓ + mM ≤ n Proof . Let ω k = e 2 πik/M for k = 0 , 1 , . . . , M − 1 be the M th root of unity. � 1 M − 1 1 if M | n � ω n k = 0 otherwise . M k =0 where M | n means that M divides n . Then p k q n − k = 1 + ( pω 1 + q ) n − ℓ + .. + ( pω M − 1 + q ) n − ℓ � n = 1 � � M + O ( ρ n ) , k M m : k = ℓ + mM ≤ n since | ( pω r + q ) | = p 2 + q 2 + 2 pq cos(2 πr/M ) < 1 for r � = 0 .

Finishing the Rational Case We shall use the following Fourier series; for real x ∞ � x � = 1 sin 2 πmx = 1 i c m e 2 πimx , c m = − � � 2 − 2 − 2 πm, mπ m =1 m ∈ Z −{ 0 } Continuing the derivation and using the above lemma we obtain   M − 1 M − 1 1  1  = 1 c m e 2 πimnβ 1 e 2 πim ℓ c m e 2 πim ( ℓ/M + βn ) � � � � S n = 2 − 2 − M M M ℓ =0 m � =0 m � =0 ℓ =0 1 2 − 1 c kM e 2 πikMβn = 1 2 − 1 � 1 � � = 2 − � βnM � . M M m = kM � =0

Outline Update 1. Source Coding 2. Redundancy: Known Sources 3. Shannon and Huffman Redundancy for Memoryless Sources 4. Shannon Coding Redundancy for Markov Sources

Markov Sources Source sequence X 1 , X 2 , . . . , over alphabet A = { 1 , 2 , . . . , r } is generated by a first–order Markov chain with a given matrix P = { p ( j | k ) } r j,k =1 . with initial state probabilities p k , k = 1 , 2 , . . . , r ; stationary state probabilities π k , k = 1 , 2 , . . . , r . For x n = ( x 1 , . . . , x n ) ∈ A n under the given Markov source, is n P ( x n ) = p x 1 � p ( x t | x t − 1 ) . t =2 The average redundancy of the Shannon code is defined as R n = E [ ⌈− log P ( X n ) ⌉ + log P ( X n )] = E [ ̺ ( − log P ( X n ))] . ̺ ( u ) = ⌈ u ⌉ − u.

Main Result for Markov Sources Theorem 2 (Merhav & W.S.) . Consider the Shannon code of length n for an aperiodic and irreducible Markov source. Define � p ( j | 1) p ( j | j ) � α jk = log , j, k ∈ { 1 , 2 , . . . , r } . p ( k | 1) p ( j | k ) (a) If not all { α jk } are rational, then R n = 1 2 + o (1) . (b) If all { α jk } are rational, then let ζ jk ( n ) = M [ − ( n − 1) log p (1 | 1) + log p ( j | 1) − log p ( k | 1) − log p j ] , and r r Ω n = 1 1 − 1 + 1 � � � � p j π k ̺ [ ζ jk ( n )] , 2 M M j =1 k =1 M is the smallest common multiple of the denominators of { α jk } . Then, there exists a positive sequence ξ n → 0 r r Ω n + 1 � � R n ≤ p j π k I{ ̺ [ ζ jk ( n )] / ∈ ( ξ n , 1 − ξ n ) } + o (1) , M j =1 k =1 r r Ω n − 1 � � R n ≥ p j π k I{ ̺ [ ζ jk ( n )] / ∈ ( ξ n , 1 − ξ n ) } − o (1) . M j =1 k =1

Sketch of Proof 1 . We note that ρ ( u ) has the following Fourier series expansion ̺ ( u ) = 1 1 � a m e 2 πimu , 2 + a m = 2 πm m � =0 where a m · k = a m /k for integers m, k . 2 . Since R n = E [ ̺ (log P ( X n ))] (for aperiodic irreducible MC) we have R n = 1 a m E [ e − 2 πim log P ( Xn ) ] . � 2 + m � =0 which we can re-write as n R n = 1 � � � 2 + a m p ( x t | x t − 1 ) exp [ − 2 πim log p ( x t | x t − 1 )] x ∈A n t =1 m � =0 � n since P ( x n ) = p x 1 t =2 p ( x t | x t − 1 ) .

Average Redundancy of the Shannon Code for Markov Sources Neri - PowerPoint PPT Presentation

Average Redundancy of the Shannon Code for Markov Sources Neri Merhav and Wojciech Szpankowski Technion and Purdue University May 27, 2013 NSF STC Center for Science of Information AofA, Menorca 2013 Dedicated to PHILIPPE FLAJOLET Outline

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Partial Redundancy Elimination CS243 Review Session Full Redundancy x = b + c y = b + c z = b

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Mid Shannon Wilderness Park The potential future of the Longford bogs Mid Shannon Potential 22

Overview ECE 753: FAULT-TOLERANT Introduction - Sources COMPUTING Hardware redundancy

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

Realistic analysis of algorithms Application to some popular algorithms Julien Clment

The setting of the research ( s ) { 0 , 1 } 1 ( ( s )) = s s S

Your Data in the Cloud Week 7 Frank Chen | Spring 2017 Frank Chen | Spring 2017 Agenda

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured

BU CS 332 Theory of Computation Lecture 2: Reading: Deterministic Finite Automata

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http:/

Formal Models of Language Paula Buttery Dept of Computer Science & Technology, University of

. there was Justesen (and earlier, Shannon and Berger .) Peter H.N. de With

Sambuz

Useful Links

Newsletter

Mail Us

Average Redundancy of the Shannon Code for Markov Sources Neri - PowerPoint PPT Presentation

Average Redundancy of the Shannon Code for Markov Sources Neri Merhav and Wojciech Szpankowski Technion and Purdue University May 27, 2013 NSF STC Center for Science of Information AofA, Menorca 2013 Dedicated to PHILIPPE FLAJOLET Outline

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Partial Redundancy Elimination CS243 Review Session Full Redundancy x = b + c y = b + c z = b

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Mid Shannon Wilderness Park The potential future of the Longford bogs Mid Shannon Potential 22

Overview ECE 753: FAULT-TOLERANT Introduction - Sources COMPUTING Hardware redundancy

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Model Repair for Markov Decision Model Repair for Markov Decision Model Repair for Markov

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Markov processes (Markov chains) Construct a Bayes net from these variables: parents? Markov

Realistic analysis of algorithms Application to some popular algorithms Julien Clment

The setting of the research ( s ) { 0 , 1 } 1 ( ( s )) = s s S

Your Data in the Cloud Week 7 Frank Chen | Spring 2017 Frank Chen | Spring 2017 Agenda

Processing Regular Path Queries Using Views or What Do We Need for Integrating Semistructured

BU CS 332 Theory of Computation Lecture 2: Reading: Deterministic Finite Automata

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http:/

Formal Models of Language Paula Buttery Dept of Computer Science &amp; Technology, University of

. there was Justesen (and earlier, Shannon and Berger .) Peter H.N. de With

Sambuz

Useful Links

Newsletter

Mail Us

Formal Models of Language Paula Buttery Dept of Computer Science & Technology, University of