Chapter 4 Entropy Rates of a Stochastic Process Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University
Chapter Outline Chap. 4 Entropy Rates of a Stochastic Process 4.1 Markov Chains 4.2 Entropy Rate 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph 4.4 Second Law of Thermodynamics 4.5 Functions of Markov Chains Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 2/13
4.1 Markov Chains Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 3/13
Stationary Definition (Stationary) A stochastic process is said to be stationary if Pr { X 1 = x 1 , X 2 = x 2 , . . . , X n = x n } = Pr { X 1+ ℓ = x 1 , X 2+ ℓ = x 2 , . . . , X n + ℓ = x n } for every n and every shift ℓ . ■ the joint distribution of any subset of the sequence of random variables is invariant with respect to shifts in the time index. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 4/13
Markov chain Definition (Markov chain) A discrete stochastic process X 1 , X 2 , . . . is said to be a Markov chain or a Markov process if for n = 1 , 2 , . . . , Pr { X n +1 = x n +1 | X n = x n , X n − 1 = x n − 1 , . . . , X 1 = x 1 } = Pr { X n +1 = x n +1 | X n = x n } . ■ The joint pmf can be written as p ( x 1 , x 2 , . . . , x n ) = p ( x 1 ) p ( x 2 | x 1 ) p ( x 3 | x 2 ) · · · p ( x n | x n − 1 ) . Definition (Time invariant) The Markov chain is said to be time invariant if the transition probability p ( x n +1 | x n ) , Pr { X n +1 = b | X n = a } = Pr { X 2 = b | X 1 = a } for all a, b ∈ X . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 5/13
Markov chain ■ We will assume that the Markov chain is time invariant. ■ X n is called the state at time n . ■ A time invariant Markov chain is characterized by its initial state and a probability transition matrix P = [ P ij ] , i, j ∈ { 1 , 2 , . . . , m } , where P i,j = Pr { X n +1 = j | X n = i } . ■ The pmf at time n + 1 is � p ( x n +1 ) = p ( x n ) P x n x n +1 x n ■ A distribution on the states such that the distribution at time n + 1 is the same as the distribution at time n is called a stationary distribution . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 6/13
Example 4.1.1 Consider a two-state Markov chain with a probability transition matrix � � 1 − α α P = . 1 − β β Find its stationary distribution and entropy. Solution. Let µ 1 , µ 2 be the stationary distribution. µ 1 = µ 1 (1 − α ) + µ 2 β µ 2 = µ 1 α + µ 2 (1 − β ) and µ 1 + µ 2 = 1 . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 7/13
4.2 Entropy Rate Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 8/13
Entropy Rate Definition (Entropy Rate) The entropy of a random process { X i } is defined by 1 H ( X ) = lim nH ( X 1 , X 2 , . . . , X n ) . n →∞ Definition (Conditional Entropy Rate) The entropy of a random process { X i } is defined by H ′ ( X ) = lim n →∞ H ( X n | X 1 , X 2 , . . . , X n − 1 ) . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 9/13
Entropy Rate ■ If X 1 , X 2 , . . . are i.i.d. random variables. Then H ( X 1 , X 2 , . . . , X n ) = lim nH ( X 1 ) H ( X ) = lim = H ( X 1 ) . n n n →∞ ■ If X 1 , X 2 , . . . are independent but not identical distributed n 1 � H ( X ) = lim H ( X i ) . n n →∞ i =1 ■ We can choose a sequence of distributions on X 1 , X 2 . . . such that the limit does not exist. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 10/13
Entropy Rate Theorem 4.2.2 For a stationary stochastic process, H ( X n | X n − 1 , . . . , X 1 ) is nonincreasing in n and has a limit H ′ ( X ) . Proof. H ( X n +1 | X 1 , X 2 , . . . , X n ) ≤ H ( X n +1 | X 2 , . . . , X n ) ( conditioning reduce entropy ) = H ( X n | X 1 , . . . , X n − 1 ) ( stationary ) Since H ( X n | X n − 1 , . . . , X 1 ) is nonnegative and decreasinging, it has a limit H ′ ( X ) . Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 11/13
Entropy Rate Theorem 4.2.1 For a stationary stochastic process, both H ( X ) and H ′ ( X ) exist and are equal. H ( X ) = H ′ ( X ) . Proof. By the chain rule, n nH ( X 1 , X 2 , . . . , X n ) = 1 1 � H ( X i | , X i − 1 , . . . , X 1 ) , n i =1 that is, the entropy rate is the time average of the conditional entropies. Since the conditional entropies has a limit H ′ ( X ) . We conclude that � the entropy rate has the same limit by Theorem of Ces´ aro mean. Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 12/13
Ces´ aro mean � n aro mean) If a n → a and b n = 1 i =1 a i , then Theorem (Ces´ n b n → a . Proof. Let ǫ > 0 . Since a n → a , there exists a number N such that | a n − a | ≤ ǫ for n > N . Hence, � � n n 1 � ≤ 1 � � � � | b n − a | = ( a i − a ) | ( a i − a ) | � � n n � � � i =1 i =1 N N ≤ 1 | ( a i − a ) | + n − N ǫ ≤ 1 � � | ( a i − a ) | + ǫ ≤ ǫ n n n i =1 i =1 when n is large enough. � Peng-Hua Wang, April 2, 2012 Information Theory, Chap. 4 - p. 13/13
Recommend
More recommend