Chapter 8 Differential Entropy Peng-Hua Wang Graduate Inst. of - PowerPoint PPT Presentation

Chapter 8 Differential Entropy Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University

Chapter Outline Chap. 8 Differential Entropy 8.1 Definitions 8.2 AEP for Continuous Random Variables 8.3 Relation of Differential Entropy to Discrete Entropy 8.4 Joint and Conditional Differential Entropy 8.5 Relative Entropy and Mutual Information 8.6 Properties of Differential Entropy and Related Amounts Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 2/24

8.1 Definitions Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 3/24

Definitions Definition 1 (Differential entropy) The differential entropy h ( X ) of a continuous random variable X with pdf f ( X ) is defined as � h ( X ) = − f ( x ) log f ( x ) dx, S where S is the support region of the random variable. Example � a 1 a log 1 X ∼ U (0 , a ) , h ( X ) = − adx = log a. 0 Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 4/24

Differential Entropy of Gaussian 2 πσ 2 e − x 2 2 σ 2 , then 1 Example. If X ∼ N (0 , σ 2 ) with pdf φ ( x ) = √ � h a ( x ) = − φ ( x ) log a φ ( x ) dx 2 πσ 2 − x 2 � � � 1 √ = − φ ( x ) log a 2 σ 2 log a e dx = 1 2 log a (2 πσ 2 ) + log a e 2 σ 2 E φ [ X 2 ] = 1 2 log a (2 πeσ 2 ) � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 5/24

Differential Entropy of Gaussian Remark. If a random variable with pdf f ( x ) has zero mean and variance σ 2 , then � − f ( x ) log a φ ( x ) dx 2 πσ 2 − x 2 � � � 1 √ = − f ( x ) log a 2 σ 2 log a e dx =1 2 log a (2 πσ 2 ) + log a e 2 σ 2 E f [ X 2 ] = 1 2 log a (2 πeσ 2 ) Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 6/24

Gaussian has Maximal Differential Entropy Suppose that a random variable X with pdf f ( x ) has zero mean and variance σ 2 , what is its maximal differential entropy? Let φ ( x ) be the pdf of N (0 , σ 2 ) . � � f ( x ) log φ ( x ) h ( X ) + f ( x ) log φ ( x ) dx = f ( x ) dx �� f ( x ) φ ( x ) ≤ log f ( x ) dx (convexity of logarithm) � = log φ ( x ) dx = 0 That is, � f ( x ) log φ ( x ) dx = 1 2 log(2 πeσ 2 ) h ( X ) ≤ − and equality holds if f ( x ) = φ ( x ) . � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 7/24

8.2 AEP for Continuous Random Variables Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 8/24

AEP Theorem 1 (AEP) Let X 1 , X 2 , . . . , X n be a sequence of i.i.d. random variables with common pdf f ( x ) . Then, − 1 n log f ( X 1 , X 2 , . . . , X n ) → E [ − log f ( X )] = h ( X ) in probability. Definition 2 (Typical Set) For ǫ > 0 the typical set A ( n ) with respect ǫ to f ( x ) is defined as � ( x 1 , x 2 , . . . , x n ) ∈ S n : A ( n ) = ǫ � � � � − 1 � � n log f ( x 1 , x 2 , . . . , x n ) − h ( X ) � ≤ ǫ � � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 9/24

AEP Definition 3 (Volume) The volume Vol( A ) of a set A ⊂ R n is defined as � Vol( A ) = dx 1 dx 2 . . . dx n A Theorem 2 (Properties of typical set) 1. Pr( A ( n ) ǫ ) > 1 − ǫ for n sufficiently large. 2. Vol( A ( n ) ǫ ) ≤ 2 n ( h ( X )+ ǫ ) for all n . 3. Vol( A ( n ) ǫ ) ≥ (1 − ǫ )2 n ( h ( X ) − ǫ ) for n sufficiently large. Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 10/24

8.4 Joint and Conditional Differential Entropy Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 11/24

Definitions Definition 4 (Differential entropy) The differential entropy of jointly distributed random variables X 1 , X 2 , . . . , X n is defined as � f ( x n ) log f ( x n ) dx n h ( X 1 , X 2 , . . . , X n ) = − where f ( x n ) = f ( x 1 , x 2 , . . . , x n ) is the joint pdf. Definition 5 (Conditional differential entropy) The conditional differential entropy of jointly distributed random variables X, Y with joint pdf f ( x, y ) is defined as, if it exists, � h ( X | Y ) = − f ( x, y ) log f ( x | y ) dxdy = h ( X, Y ) − h ( Y ) Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 12/24

Multivariate Normal Distribution Theorem 3 (Entropy of a multivariate normal) Let X 1 , X 2 , . . . , X n have a multivariate normal distribution with mean vector µ and covariance matrix K . Then h ( X 1 , X 2 , . . . , X n ) = 1 2 log(2 πe ) n | K | Proof. The joint pdf of a multivariate normal distribution is 1 (2 π ) n/ 2 | K | 1 / 2 e − 1 2 ( x − µ ) t K − 1 ( x − µ ) φ ( x ) = Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 13/24

Multivariate Normal Distribution Therefore, � h ( X 1 , X 2 , . . . , X n ) = − φ ( x ) log a φ ( x ) d x � 1 � � 2 log a (2 π ) n | K | + 1 2( x − µ ) t K − 1 ( x − µ ) log a e = φ ( x ) d x =1 2 log a (2 π ) n | K | + 1 � � ( x − µ ) t K − 1 ( x − µ ) 2(log a e ) E � �� = n =1 2 log a (2 π ) n | K | + 1 2 n log a e =1 2 log a (2 πe ) n | K | � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 14/24

Multivariate Normal Distribution Let Y = ( Y 1 , Y 2 , . . . , Y n ) t be a random vector. If K = E [ YY t ] , then E [ Y t K − 1 Y ] = n. Proof. Denote   | | |   K = E [ YY t ] = . . .  k 1 k 2 k n    | | | and   a t 1   a t   K − 1 = 2     .   .  .    a t n We have k i = E [ Y i Y ] and a t j k i = δ ij . Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 15/24

Multivariate Normal Distribution Now,     a t a t 1 Y 1     a t a t     2 Y 2     Y t K − 1 Y = Y t Y = ( Y 1 , Y 2 , . . . , Y n )     . .     . .  .   .      a t a t n Y n = Y 1 a t 1 Y + Y 2 a t 2 Y + · · · + Y n a t n Y and E [ Y t K − 1 Y ] = a t 1 E [ Y 1 Y ] + a t 2 E [ Y 2 Y ] + · · · + a t n E [ Y n Y ] = a t 1 k 1 + a t 2 k 2 + · · · + a t n k n = n � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 16/24

8.5 Relative Entropy and Mutual Information Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 17/24

Definitions Definition 6 (Relative entropy) The relative entropy (or KullbackVLeibler distance) D ( f || g ) between two densities f ( x ) and g ( x ) is defined as � f ( x ) log f ( x ) D ( f || g ) = g ( x ) dx Definition 7 (Mutual information) The mutual information I ( X ; Y ) between two random variables with joint density f ( x, y ) is defined as f ( x, y ) log f ( x, y ) � I ( X ; Y ) = f ( x ) f ( y ) dxdy Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 18/24

Example Let ( X, Y ) ∼ N (0 , K ) where � � σ 2 ρσ 2 K = . ρσ 2 σ 2 2 log(2 πe ) σ 2 and Then h ( X ) = h ( Y ) = 1 h ( X, Y ) = 1 2 log(2 πe ) 2 | K | = 1 2 log(2 πe ) 2 σ 4 (1 − ρ 2 ) . Therefore, I ( X ; Y ) = h ( X ) + h ( Y ) − h ( X ; Y ) = − 1 2 log(1 − ρ 2 ) . Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 19/24

8.6 Properties of Differential Entropy and Related Amounts Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 20/24

Properties Theorem 4 (Relative entropy) D ( f || g ) ≥ 0 with equality iff f = g almost everywhere. Corollary 1 1. I ( X ; Y ) ≥ 0 with equality iff X and Y are independent. 1. h ( X | Y ) ≤ h ( X ) with equality iff X and Y are independent. Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 21/24

Properties Theorem 5 (Chain rule for differential entropy) n � h ( X 1 , X 2 , . . . , X n ) = h ( X i | X 1 , X 2 , . . . , X i − 1 ) i =1 Corollary 2 n � h ( X 1 , X 2 , . . . , X n ) ≤ h ( X i ) i =1 Corollary 3 (Hadamard’s inequality) If K is the covariance matrix of a multivariate normal distribution, then n � | K | ≤ K ii . i =1 Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 22/24

Properties Theorem 6 1. h ( X + c ) = h ( X ) 2. h ( aX ) = h ( X ) + log | a | . 3. h ( AX ) = h ( X ) + log | det( A ) | Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 23/24

Gaussian has Maximal Entropy Theorem 7 Let the random vector X ∈ R n have zero mean and covariance K = E [ XX t ] . Then h ( X ) ≤ 1 2 log(2 πe ) n | K | . with equality X ∼ N ( 0 , K ) � Proof. Let g ( x ) be any density satisfying x i x j g ( x ) d x = K ij . Let φ ( x ) be the density of N ( 0 , K ) . Then, � � 0 ≤ D ( g || φ ) = g log( g/φ ) = − h ( g ) − g log φ � = − h ( g ) − φ log φ = − h ( g ) + h ( φ ) That is, h ( g ) ≤ h ( φ ) . Equality holds if g = φ. � Peng-Hua Wang, May 14, 2012 Information Theory, Chap. 8 - p. 24/24

Chapter 8 Differential Entropy Peng-Hua Wang Graduate Inst. of - PowerPoint PPT Presentation

Chapter 8 Differential Entropy Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 8 Differential Entropy 8.1 Definitions 8.2 AEP for Continuous Random Variables 8.3 Relation of Differential

Entropy, Relative Entropy, Cross Entropy Entropy Entropy, H(x) is a measure of the uncertainty of

Formal Modeling in Cognitive Science Lecture 25: Entropy, Joint Entropy, Conditional Entropy 1

Chapter 2 Entropy, Relative Entropy, and Mutual Infor- mation Peng-Hua Wang Graduate Institute

Entropy Coding Definition of Entropy Three Entropy coding techniques: (taken from the

1) Entropy = measure of randomness 2) Entropy = measure of compressibility More random = Less

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

4200:225 Equilibrium Thermodynamics Unit I. Earth, Air, Fire, and Water Chapter 3. The Entropy

Chapter 4 Entropy Rates of a Stochastic Process Peng-Hua Wang Graduate Inst. of Comm.

Road detection via entropy By Anna Zaidman 1 1 What is entropy? Entropy is a mathematically

Entropy Change in Entropy Reversible Isobaric Process Ideal Gas in a Reversible Process Free

Entropy and The Second Law of Thermodynamics Entropy (S)

Orc David Schleef Entropy Wave Inc (c) 2009 Entropy Wave Inc What is Orc A system for

Topological entropy and algebraic entropy on locally compact abelian groups - The Bridge Theorem

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Tutorial: Differential Categories and Cartesian Differential Categories JS Pacaud Lemay FMCS

Chapter 2 Chapter 2 Systems Defined by Systems Defined by Differential or Difference

2.1 Solution Curves (Without a Solution) a lesson for MATH F302 Differential Equations Ed Bueler,

Chapter 1 First-Order Differential Equations Alan H. Stein University of Connecticut Alan H.

Generalized Hermite reduction, Creative telescoping, and Definite integration of differentially

Population Modeling with Ordinary Differential Equations Michael J. Coleman November 6, 2006

Introduction to Partial Differential Equations Introductory Course on Multiphysics Modelling T

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

Differentially Flat Nonlinear Control Systems: Overview of the Theory and Applications, and

Chapter 4: Higher-Order Differential Equations Part 1 Department of Electrical Engineering