Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12
Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12
Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12
Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12
Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q ◮ H ( 1 , 0 ) = ( 0 , 1 ) = 0 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12
Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q ◮ H ( 1 , 0 ) = ( 0 , 1 ) = 0 ◮ H ( 1 2 , 1 2 ) = 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12
Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q ◮ H ( 1 , 0 ) = ( 0 , 1 ) = 0 ◮ H ( 1 2 , 1 2 ) = 1 ◮ h ( p ) := H ( p , 1 − p ) is continuous Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12
Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q ◮ H ( 1 , 0 ) = ( 0 , 1 ) = 0 ◮ H ( 1 2 , 1 2 ) = 1 ◮ h ( p ) := H ( p , 1 − p ) is continuous Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12
Examples 1. X ∼ ( 1 2 , 1 4 , 1 4 ) : (i.e., for some x 1 � = x 2 � = x 3 , P X ( x 1 ) = 1 2 , P X ( x 2 ) = 1 4 , P X ( x 3 ) = 1 4 ) H ( X ) = − 1 2 log 1 2 − 1 4 log 1 4 − 1 4 log 1 4 = 1 2 + 1 4 · 2 + 1 4 · 2 = 1 1 2 . 2. H ( X ) = H ( 1 2 , 1 4 , 1 4 ) . 3. X is uniformly distributed over { 0 , 1 } n : H ( X ) = − � 2 n 2 n log 1 1 2 n = − log 1 2 n = n . i = 1 ◮ n bits are needed to describe X ◮ n bits are needed to create X 4. X = X 1 , . . . , X n where X i ’s iid over { 0 , 1 } , with P X i ( 1 ) = 1 3 . H ( X ) =? 5. X ∼ ( p , q ) , p + q = 1 ◮ H ( X ) = H ( p , q ) = − p log p − q log q ◮ H ( 1 , 0 ) = ( 0 , 1 ) = 0 ◮ H ( 1 2 , 1 2 ) = 1 ◮ h ( p ) := H ( p , 1 − p ) is continuous Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 3 / 12
Axiomatic derivation of the entropy function Any other choices for defining entropy? Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12
Axiomatic derivation of the entropy function Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H ( p , 1 − p ) is continuous function of p . A2 Normalization: H ( 1 2 , 1 2 ) = 1 A3 Grouping axiom: p 1 p 2 H ( p 1 , p 2 , . . . , p m ) = H ( p 1 + p 2 , p 3 , . . . , p m ) + ( p 1 + p 2 ) H ( p 1 + p 2 , p 1 + p 2 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12
Axiomatic derivation of the entropy function Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H ( p , 1 − p ) is continuous function of p . A2 Normalization: H ( 1 2 , 1 2 ) = 1 A3 Grouping axiom: p 1 p 2 H ( p 1 , p 2 , . . . , p m ) = H ( p 1 + p 2 , p 3 , . . . , p m ) + ( p 1 + p 2 ) H ( p 1 + p 2 , p 1 + p 2 ) Why A 3? Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12
Axiomatic derivation of the entropy function Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H ( p , 1 − p ) is continuous function of p . A2 Normalization: H ( 1 2 , 1 2 ) = 1 A3 Grouping axiom: p 1 p 2 H ( p 1 , p 2 , . . . , p m ) = H ( p 1 + p 2 , p 3 , . . . , p m ) + ( p 1 + p 2 ) H ( p 1 + p 2 , p 1 + p 2 ) Why A 3? Not hard to prove that Shannon’s entropy function satisfies above axioms, proving this is the only such function is more challenging. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12
Axiomatic derivation of the entropy function Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H ( p , 1 − p ) is continuous function of p . A2 Normalization: H ( 1 2 , 1 2 ) = 1 A3 Grouping axiom: p 1 p 2 H ( p 1 , p 2 , . . . , p m ) = H ( p 1 + p 2 , p 3 , . . . , p m ) + ( p 1 + p 2 ) H ( p 1 + p 2 , p 1 + p 2 ) Why A 3? Not hard to prove that Shannon’s entropy function satisfies above axioms, proving this is the only such function is more challenging. Let H be a symmetric function that satisfying the above axioms. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12
Axiomatic derivation of the entropy function Any other choices for defining entropy? Shannon function is the only symmetric function (over probability distributions) satisfying the following three axioms: A1 Continuity: H ( p , 1 − p ) is continuous function of p . A2 Normalization: H ( 1 2 , 1 2 ) = 1 A3 Grouping axiom: p 1 p 2 H ( p 1 , p 2 , . . . , p m ) = H ( p 1 + p 2 , p 3 , . . . , p m ) + ( p 1 + p 2 ) H ( p 1 + p 2 , p 1 + p 2 ) Why A 3? Not hard to prove that Shannon’s entropy function satisfies above axioms, proving this is the only such function is more challenging. Let H be a symmetric function that satisfying the above axioms. We prove (assuming additional axiom) that H is the Shannon function. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 4 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 H ( S 3 , p 4 , . . . , p m ) + S 3 h ( p 3 ) + S 2 h ( p 2 = ) S 3 S 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 H ( S 3 , p 4 , . . . , p m ) + S 3 h ( p 3 ) + S 2 h ( p 2 = ) S 3 S 2 . . . k S i h ( p i � = H ( S k , p k + 1 , . . . , p m ) + ) S i i = 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 H ( S 3 , p 4 , . . . , p m ) + S 3 h ( p 3 ) + S 2 h ( p 2 = ) S 3 S 2 . . . k S i h ( p i � = H ( S k , p k + 1 , . . . , p m ) + ) S i i = 2 Hence, k − 1 H ( p 1 , . . . , p k ) = H ( S k − 1 , p k S i h ( p i / S k � ) + ) S k S k S k S k S k S i / S k i = 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 H ( S 3 , p 4 , . . . , p m ) + S 3 h ( p 3 ) + S 2 h ( p 2 = ) S 3 S 2 . . . k S i h ( p i � = H ( S k , p k + 1 , . . . , p m ) + ) S i i = 2 Hence, k − 1 k H ( p 1 , . . . , p k ) = H ( S k − 1 , p k S i h ( p i / S k ) = 1 S i h ( p i � � ) + ) (2) S k S k S k S k S k S i / S k S k S i i = 2 i = 2 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Generalization of the grouping axiom Fix p = ( p 1 , . . . , p m ) and let S k = � k i = 1 p i . Grouping axiom: H ( p 1 , p 2 , . . . , p m ) = H ( S 2 , p 3 , . . . , p m ) + S 2 H ( p 1 S 2 , p 2 S 2 ) . Claim 1 (Generalized grouping axiom) H ( p 1 , p 2 , . . . , p m ) = H ( S k , p k + 1 , . . . , p m ) + S k · H ( p 1 S k , . . . , p k S k ) Proof : Let h ( q ) = H ( q , 1 − q ) . H ( S 2 , p 3 , . . . , p m ) + S 2 h ( p 2 H ( p 1 , p 2 , . . . , p m ) = ) (1) S 2 H ( S 3 , p 4 , . . . , p m ) + S 3 h ( p 3 ) + S 2 h ( p 2 = ) S 3 S 2 . . . k S i h ( p i � = H ( S k , p k + 1 , . . . , p m ) + ) S i i = 2 Hence, k − 1 k H ( p 1 , . . . , p k ) = H ( S k − 1 , p k S i h ( p i / S k ) = 1 S i h ( p i � � ) + ) (2) S k S k S k S k S k S i / S k S k S i i = 2 i = 2 Claim follows by combining the above equations. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 5 / 12
Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12
Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12
Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12
Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12
Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12
Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12
Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = 2 f ( 3 ) = 2 H ( 1 3 , 1 3 , 1 3 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12
Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = 2 f ( 3 ) = 2 H ( 1 3 , 1 3 , 1 3 ) ⇒ f ( 3 n ) = nf ( 3 ) . = Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12
Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = 2 f ( 3 ) = 2 H ( 1 3 , 1 3 , 1 3 ) ⇒ f ( 3 n ) = nf ( 3 ) . = ◮ f ( mn ) = f ( m ) + f ( n ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12
Further generalization of the grouping axiom Let 1 = k 1 < k 2 < . . . < k q < m and let C t = � k t + 1 − 1 p i (letting k q + 1 = m + 1). i = k t Claim 2 (Generalized ++ grouping axiom) H ( p 1 , p 2 , . . . , p m ) = p k 2 − 1 p kq + 1 H ( C 1 , . . . , C q ) + C 1 · H ( p 1 C q , . . . , p m C 1 , . . . , C 1 ) + . . . + C q · H ( C q ) Proof : Follow by the extended group axiom and the symmetry of H Implication: Let f ( m ) = H ( 1 m , . . . , 1 ) m � �� � m ◮ f ( 3 2 ) = 2 f ( 3 ) = 2 H ( 1 3 , 1 3 , 1 3 ) ⇒ f ( 3 n ) = nf ( 3 ) . = ◮ f ( mn ) = f ( m ) + f ( n ) ⇒ f ( m k ) = kf ( m ) = Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 6 / 12
f ( m ) = log m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12
f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12
f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12
f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12
f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12
f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . ◮ By grouping axiom, k < nf ( 3 ) < k + 1. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12
f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . ◮ By grouping axiom, k < nf ( 3 ) < k + 1. ⌊ n log 3 ⌋ < f ( 3 ) < ⌊ n log 3 ⌋ + 1 = ⇒ for any n ∈ N n n Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12
f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . ◮ By grouping axiom, k < nf ( 3 ) < k + 1. ⌊ n log 3 ⌋ < f ( 3 ) < ⌊ n log 3 ⌋ + 1 = ⇒ for any n ∈ N n n = ⇒ f ( 3 ) = log 3. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12
f ( m ) = log m We give a proof under the additional axiom A4 f ( m ) < f ( m + 1 ) (you can Google for a proof using only A 1– A 3) ◮ For n ∈ N let k = ⌊ n log 3 ⌋ . ◮ By A 4, f ( 2 k ) < f ( 3 n ) < f ( 2 k + 1 ) . ◮ By grouping axiom, k < nf ( 3 ) < k + 1. ⌊ n log 3 ⌋ < f ( 3 ) < ⌊ n log 3 ⌋ + 1 = ⇒ for any n ∈ N n n = ⇒ f ( 3 ) = log 3. ◮ Proof extends to any integer (not only 3) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 7 / 12
H ( p , q ) = − p log p − q log q Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12
H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12
H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12
H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12
H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . ◮ Hence, H ( p , q ) = log m − p log k − q log ( m − k ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12
H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . ◮ Hence, H ( p , q ) = log m − p log k − q log ( m − k ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12
H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . ◮ Hence, H ( p , q ) = log m − p log k − q log ( m − k ) = p ( logm − logk ) + q ( log m − log ( m − k )) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12
H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . ◮ Hence, H ( p , q ) = log m − p log k − q log ( m − k ) = p ( logm − logk ) + q ( log m − log ( m − k )) − p log m k − q log m − k = = − p log p − q log q m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12
H ( p , q ) = − p log p − q log q ◮ For rational p , q , let p = k m and q = m − k m , where m is the smallest common multiplier. ◮ By grouping axiom, f ( m ) = H ( p , q ) + p · f ( k ) + q · f ( m − k ) . ◮ Hence, H ( p , q ) = log m − p log k − q log ( m − k ) = p ( logm − logk ) + q ( log m − log ( m − k )) − p log m k − q log m − k = = − p log p − q log q m ◮ By continuity axiom, holds for every p , q . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 8 / 12
H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12
H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12
H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12
H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12
H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12
H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) ◮ Hence, H ( p 1 , p 2 , p 3 ) = log m − p 1 log k 1 − p 2 log k 2 − p 3 log k 3 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12
H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) ◮ Hence, H ( p 1 , p 2 , p 3 ) = log m − p 1 log k 1 − p 2 log k 2 − p 3 log k 3 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12
H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) ◮ Hence, H ( p 1 , p 2 , p 3 ) = log m − p 1 log k 1 − p 2 log k 2 − p 3 log k 3 − p 1 log k 1 m − p 2 log k 2 k 3 = m − p 3 m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12
H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) ◮ Hence, H ( p 1 , p 2 , p 3 ) = log m − p 1 log k 1 − p 2 log k 2 − p 3 log k 3 − p 1 log k 1 m − p 2 log k 2 k 3 = m − p 3 m = − p 1 log p 1 − p 2 log p 2 − p 3 log p 3 Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12
H ( p 1 , p 2 , . . . , p m ) = − � m i − p i log p i We prove for m = 3. Proof for arbitrary m follows the same lines. ◮ For rational p 1 , p 2 , p 3 , let p 1 = k 1 m , q = k 2 m and p 3 = k 3 m , where m = k 1 + k 2 + k 3 is the smallest common multiplier. ◮ f ( m ) = H ( p 1 , p 2 , p 3 ) + p 1 f ( k 1 ) + p 2 f ( k 2 ) + p 3 f ( k 3 ) ◮ Hence, H ( p 1 , p 2 , p 3 ) = log m − p 1 log k 1 − p 2 log k 2 − p 3 log k 3 − p 1 log k 1 m − p 2 log k 2 k 3 = m − p 3 m = − p 1 log p 1 − p 2 log p 2 − p 3 log p 3 ◮ By continuity axiom, holds for every p 1 , p 2 , p 3 . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 9 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) (by induction) ∀ t 1 , . . . , t k , λ 1 , . . . , λ k ∈ [ 0 , 1 ] with � = ⇒ i λ i = 1 � i λ i f ( λ i t i ) ≤ f ( � i λ i t i ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) (by induction) ∀ t 1 , . . . , t k , λ 1 , . . . , λ k ∈ [ 0 , 1 ] with � = ⇒ i λ i = 1 � i λ i f ( λ i t i ) ≤ f ( � i λ i t i ) = ⇒ (Jensen inequality): E f ( X ) ≤ f ( E X ) for any random variable X . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) (by induction) ∀ t 1 , . . . , t k , λ 1 , . . . , λ k ∈ [ 0 , 1 ] with � = ⇒ i λ i = 1 � i λ i f ( λ i t i ) ≤ f ( � i λ i t i ) = ⇒ (Jensen inequality): E f ( X ) ≤ f ( E X ) for any random variable X . ◮ log ( x ) is (strictly) concave for x > 0, since its second derivative ( − 1 x 2 ) is always negative. Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) (by induction) ∀ t 1 , . . . , t k , λ 1 , . . . , λ k ∈ [ 0 , 1 ] with � = ⇒ i λ i = 1 � i λ i f ( λ i t i ) ≤ f ( � i λ i t i ) = ⇒ (Jensen inequality): E f ( X ) ≤ f ( E X ) for any random variable X . ◮ log ( x ) is (strictly) concave for x > 0, since its second derivative ( − 1 x 2 ) is always negative. ◮ Hence, H ( p 1 , . . . , p m ) = � p i ≤ log � i p i log 1 i p i 1 p i = log m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
0 ≤ H ( p 1 , . . . , p m ) ≤ log m ◮ Tight bounds ◮ H ( p 1 , . . . , p m ) = 0 for ( p 1 , . . . , p m ) = ( 1 , 0 , . . . , 0 ) . ◮ H ( p 1 , . . . , p m ) = log m for ( p 1 , . . . , p m ) = ( 1 m , . . . , 1 m ) . ◮ Non negativity is clear. ◮ A function f is concave if ∀ t 1 , t 2 , λ ∈ [ 0 , 1 ] ≤ 1 λ f ( t 1 ) + ( 1 − λ ) f ( t 2 ) ≤ f ( λ t 1 + ( 1 − λ ) t 2 ) (by induction) ∀ t 1 , . . . , t k , λ 1 , . . . , λ k ∈ [ 0 , 1 ] with � = ⇒ i λ i = 1 � i λ i f ( λ i t i ) ≤ f ( � i λ i t i ) = ⇒ (Jensen inequality): E f ( X ) ≤ f ( E X ) for any random variable X . ◮ log ( x ) is (strictly) concave for x > 0, since its second derivative ( − 1 x 2 ) is always negative. ◮ Hence, H ( p 1 , . . . , p m ) = � p i ≤ log � i p i log 1 i p i 1 p i = log m ◮ Alternatively, for X over { 1 , . . . , m } , 1 1 H ( X ) = E X log P X ( X ) ≤ log E X P X ( X ) = log m Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 10 / 12
H ( g ( X )) ≤ H ( X ) Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12
H ( g ( X )) ≤ H ( X ) Let X be a random variable, and let g be over Supp ( X ) := { x : P X ( x ) > 0 } . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12
H ( g ( X )) ≤ H ( X ) Let X be a random variable, and let g be over Supp ( X ) := { x : P X ( x ) > 0 } . ◮ H ( Y = g ( X )) ≤ H ( X ) . Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12
H ( g ( X )) ≤ H ( X ) Let X be a random variable, and let g be over Supp ( X ) := { x : P X ( x ) > 0 } . ◮ H ( Y = g ( X )) ≤ H ( X ) . Proof : Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12
H ( g ( X )) ≤ H ( X ) Let X be a random variable, and let g be over Supp ( X ) := { x : P X ( x ) > 0 } . ◮ H ( Y = g ( X )) ≤ H ( X ) . Proof : Iftach Haitner (TAU) Application of Information Theory, Lecture 1 October 28, 2014 11 / 12
Recommend
More recommend