Asymptotic behaviour of the weighted Shannon differential entropy in a Bayesian problem Mark Kelbert, Pavel Mozgunov 2nd International Electronic Conference on Entropy and Its Applications November 2015
Introduction Let U ∼ U [0 , 1]. Given a realization of this RV p , consider a sequence of conditionally independent identically distributed ( ξ i , i = 1 , 2 , . . . ), where ξ i = 1 with probability p and ξ i = 0 with probability 1 − p . Let x i , each 0 or 1, be an outcome in trial i . Denote S n = ξ i + . . . + ξ n and x = � n i =1 x i . � 1 p 2 d p = 1 / 3 if i � = j , but P ( ξ i = 1 , ξ j = 1) = 0 � 1 p d p ) 2 = 1 / 4 . P ( ξ i = 1) P ( ξ j = 1) = ( 0 The probability that after n trials the exact sequence ( x i , i = 1 , ..., n ) will appear equals � 1 1 p x (1 − p ) n − x d p = P ( ξ 1 = x 1 , ..., ξ n = x n ) = � . (1) � n ( n + 1) 0 x
Introduction This implies that the posterior probability density function (PDF) of the number of x successes after n trials is uniform 1 P ( S n = x ) = ( n + 1) , x = 0 , . . . , n . The posterior PDF given the information that after n trials one observes x successes takes the form � n � f ( n ) ( p | ξ 1 = x 1 , ..., ξ n = x n ) = f ( n ) ( p | S n = x ) = ( n + 1) p x (1 − p ) n − x , x (2) Note that conditional distribution given in (2) is a Beta-distribution . “It is known that Beta-distribution is asymptotically normal with its mean and variance as x and ( n − x ) tend to infinity, but this fact is lacking a handy reference ”
Introduction Consider RV Z ( n ) on [0; 1] with PDF (2). Note that Z ( n ) has the followings expectation: E x [ Z ( n ) ] = x + 1 n + 2 , (3) and the following variance: V x [ Z ( n ) ] = ( x + 1)( n − x + 1) . (4) ( n + 3)( n + 2) 2
Shannon’s differential entropy The goal of our previous work [13] was to study the asymptotic behaviour of the differential entropy (DE) of the following RVs: Z ( n ) with PDF f ( n ) given in (2) when x = x ( n ) = ⌊ α n ⌋ where 1 α α 0 < α < 1 and ⌊ a ⌋ is integer part of a . Z ( n ) with PDF f ( n ) given in (2) when x = x ( n ) = ⌊ n β ⌋ where 2 β β 0 < β < 1 Z ( n ) with PDF f ( n ) given in (2) when x = c 1 and Z ( n ) n − c 2 with PDF 3 c 1 c 1 f ( n ) n − c 2 given in (2) when n − x ( n ) = c 2 where c 1 and c 2 are some constants. It is shown that in the first and second cases limiting distribution is Gaussian and the differential entropy of standardized RV converges to differential entropy of standard Gaussian RV . In the third case the limiting distribution in not Gaussian , but still the asymptotic of differential entropy can be found explicitly.
Recall Differential entropy (DE) h ( f ) of a RV Z with the PDF f : � h ( f ) = h diff ( f ) = − f ( z ) log ( f ( z )) d z (5) R with the convention 0 log 0 = 0. A linear transformation X = b 1 Z + b 2 , h ( g ) = h ( f ) + log b 1 (6) where g is a PDF of RV X . Let ¯ Z be the standard Gaussian RV with PDF ϕ then the differential entropy of ¯ Z equals: h ( ϕ ) = 1 2 log (2 π e ) . Recall the definition of the Kullback–Leibler divergence of g from f � f ( x ) log f ( x ) D ( f || g ) = g ( x ) d x . (7) R
Shannon’s differential entropy. Case I Theorem Z ( n ) 2 ( Z ( n ) f ( n ) Let ˜ 1 2 ( α (1 − α )) − 1 − α ) be a RV with PDF ˜ = n α . Let α α ¯ Z ∼ N (0 , 1) be the standard Gaussian RV, then (a) ˜ Z ( n ) weakly converges to ¯ Z: α Z ( n ) ˜ ⇒ ¯ Z as n → ∞ . α (b) The differential entropy of ˜ Z ( n ) α converges to differential entropy of ¯ Z: α ) = 1 n →∞ h (˜ f ( n ) lim 2 log (2 π e ) . (c) The Kullback-Leibler divergence of ϕ from ˜ f ( n ) tends to 0 as n → ∞ : α n →∞ D (˜ f ( n ) lim α || ϕ ) = 0 .
Shannon’s differential entropy. Case I We obtained the following asymptotic of the differential entropy: � α ) − 1 2 log 2 π e [ x ( n − x )] � h ( f ( n ) lim = 0 . (8) n 3 n →∞ Particularly, � α ) − 1 2 log 2 π e [ α (1 − α )] � h ( f ( n ) lim = 0 . (9) n n →∞ Due to (6), the differential entropy of RV ˜ Z ( n ) has the form: α = 1 � � h (˜ f ( n ) lim α ) 2 log (2 π e ) . (10) n →∞
Shannon’s differential entropy. Case II Theorem Z ( n ) = n 1 − β/ 2 ( Z ( n ) f ( n ) Let ˜ − n β − 1 ) be a RV with PDF ˜ and ¯ Z ∼ N (0 , 1) β β β then (a) ˜ Z ( n ) weakly converges to ¯ Z: β Z ( n ) ˜ ⇒ ¯ Z as n → ∞ . β Z ( n ) (b) The differential entropy of ˜ converges to differential entropy of ¯ Z: β β ) = 1 n →∞ h (˜ f ( n ) lim 2 log (2 π e ) . f ( n ) (c) The Kullback-Leibler divergence of ϕ from ˜ tends to 0 as n → ∞ : β f ( n ) n →∞ D (˜ lim β || ϕ ) = 0 .
Shannon’s differential entropy. Case III Theorem Z ( n ) = nZ ( n ) f ( n ) Z ( n ) n − c 2 = nZ ( n ) Let ˜ be a RV with PDF ˜ and ˜ n − c 2 be a RV c 1 c 1 c 1 with PDF ˜ f ( n ) n − c 2 . Denote H k = 1 + 1 2 + . . . + 1 k the partial sum of harmonic series and γ the Euler-Mascheroni constant, then c 1 − 1 n →∞ h (˜ � f ( n ) (a) lim c 1 ) = c 1 + log ( c 1 − i ) − c 1 ( H c 1 − γ ) + 1 . i =0 c 2 − 1 n →∞ h (˜ f ( n ) � (b) lim n − c 2 ) = c 2 + log ( c 2 − i ) − c 2 ( H c 2 − γ ) + 1 . i =0
Motivation of the weighted differential entropy Consider the following statistical experiment with twofold goal : on the initial stage an experimenter is mainly concerns whether the 1 coin is approximately fair with a high precision. As the size of a sample grows , he proceeds to estimate the true 2 value of the parameter anyway. We want to quantify the differential entropy of this experiment taking into account its two sided objective. Quantitative measure of information gain of this experiment is provided by the concept of the weighted differential entropy.
Introducing the weight function Let φ ( n ) ≡ φ ( n ) ( α, γ, p ) be a weight function that underlines the importance of some particular value γ . Choosing the weight function we adopt the following normalization rule: � φ ( n ) f ( n ) α d p = 1 . (11) R
Weighted differential entropies The goal of the this work is to study the asymptotic behaviour of weighted Shannon’s (12) and Renyi’s differential entropies of RV Z ( n ) with PDF f ( n ) given in (2) and particular RV Z ( n ) with PDF f ( n ) given in α α (2) with x = ⌊ α n ⌋ where 0 < α < 1: � h φ ( f ( n ) φ ( n ) f ( n ) α log f ( n ) α ) = − α d p , (12) R 1 � � ν φ ( n ) � H φ ν ( f ( n ) f ( n ) α ) = d p (13) 1 − ν log α R where ν ≥ 0 and ν � = 1.
The weight function φ ( n ) The following special cases are considered: φ ( n ) ≡ 1 1 φ ( n ) depends both on n and p 2 In this paper we consider the weight function of the following form: φ ( n ) ( p ) = Λ ( n ) ( α, γ ) p γ √ n (1 − p ) (1 − γ ) √ n (14) where Λ ( n ) ( α, γ, p ) is found from the normalizing condition (11). This is the model example with a twofold goal : to emphasize a particular value γ (for moderate n ) asymptotically unbiased estimate � 1 p φ ( n ) f ( n ) d p = α. lim n →∞ 0
The weighted Shannon differential entropy Theorem For the weighted Shannon differential entropy of RV Z ( n ) with PDF f ( n ) α α and weight function φ ( n ) given in (14) the following limit exists = ( α − γ ) 2 � α ) − 1 � 2 π e α (1 − α ) �� h φ ( f ( n ) lim 2 log 2 α (1 − α ) . (15) n n →∞ If the α = γ then � � h φ ( f ( n ) α ) − h ( f ( n ) lim α ) = 0 (16) n →∞ where h ( f ( n ) α ) is the standard ( φ ≡ 1 ) Shannon’s differential entropy.
The weighted Shannon differential entropy The normalizing constant in the weight function (14) is found from the condition (11). We obtain that Γ( x + 1)Γ( n − x + 1)Γ( n + 2 + √ n ) Λ ( n ) ( γ ) = Γ( x + γ √ n + 1)Γ( n − x + 1 + √ n − γ √ n )Γ( n + 2) = (17) B ( x + 1 , n − x + 1) B ( x + γ √ n + 1 , n − x + √ n − γ √ n + 1) where Γ( x ) is the Gamma function and B ( x , y ) is the Beta function. We denote by ψ (0) ( x ) = ψ ( x ) and by ψ (1) ( x ) the digamma function and its first derivative respectively. Recall the Stirling formula: � 1 √ � n � n � 1 �� n ! = 2 π n 1 + 12 n + O as n → ∞ . (18) n 2 e
The weighted Renyi differential entropy Theorem Let Z ( n ) be a RV with PDF f ( n ) given in (2), Z ( n ) be a RV with PDF f ( n ) α α given in (2) with x = ⌊ α n ⌋ , 0 < α < 1 and H ν ( f ( n ) ) be the weighted Renyi differential entropy given in (13). (a) When φ ( n ) ≡ 1 and both ( x ) and ( n − x ) tend to infinity as n → ∞ the following limit holds � H ν ( f ( n ) ) − 1 2 log 2 π x ( n − x ) � = − log ( ν ) lim 2(1 − ν ) , (19) n 3 n →∞ (b) When the weight function φ ( n ) is given in (14) the following limit for the Renyi weighted entropy of f ( n ) holds α ( α − γ ) 2 � 2 log 2 πα (1 − α ) � α ) − 1 = − log ( ν ) H φ ν ( f ( n ) lim 2(1 − ν ) + 2 α (1 − α ) ν , (20) n n →∞
Recommend
More recommend