a conditional information inequality and its
play

A Conditional Information Inequality and Its Combinatorial - PowerPoint PPT Presentation

A Conditional Information Inequality and Its Combinatorial Applications Nikolay Vereshchagin 1 based on the joint paper with Tarik Kaced and Andrey Romashchenko 1 Moscow State University, NRU Higher School of Ecomomics and Yandex MIPT 2019 1 /


  1. A Conditional Information Inequality and Its Combinatorial Applications Nikolay Vereshchagin 1 based on the joint paper with Tarik Kaced and Andrey Romashchenko 1 Moscow State University, NRU Higher School of Ecomomics and Yandex MIPT 2019 1 / 19

  2. Shannon entropy � H ( A ) = − P[ A = a ] · log 2 P[ A = a ] a � H ( A | B ) = − P[ A = a , B = b ] · log 2 P[ A = a | B = b ] a , b Theorem H ( A ) ≤ log 2 ( the number of outcomes of A ) and H ( A ) = log 2 ( the number of outcomes of A ) iff A has the uniform distribution. 2 / 19

  3. Information inequalities Definition (Basic inequalities) The chain rule: H ( A , B ) = H ( A ) + H ( B | A ) , H ( A , B | C ) = H ( A | C ) + H ( B | A , C ) , Sub-additivity: H ( A , B ) ≤ H ( A ) + H ( B ) , H ( A , B | C ) ≤ H ( A | C ) + H ( B | C ) Definition Linear combinations of basic inequalities are called Shannon type inequalities. Example H ( B | A ) ≤ H ( B ) , H ( B | A , C ) ≤ H ( B | C ) . 3 / 19

  4. Combinatorial applications of information inequalities (an example) Theorem (Shearer’s inequality) 2 · H ( A , B , C ) ≤ H ( A , B ) + H ( A , C ) + H ( B , C ) 4 / 19

  5. Theorem (Shearer’s inequality) 2 · H ( A , B , C ) ≤ H ( A , B ) + H ( A , C ) + H ( B , C ) Proof. Add the following inequalities: H ( A , B , C ) = H ( A , B ) + H ( C | A , B ) H ( A , B , C ) ≤ H ( A ) + H ( B , C ) H ( C | A , B ) ≤ H ( C | A ) H ( A ) + H ( C | A ) = H ( A , C ) 5 / 19

  6. Theorem (Loomis–Whitney inequality) The volume of a 3-dimensional body is at most the square root of the product of its 2-dimensional projections: V 2 ≤ S 1 S 2 S 3 . 6 / 19

  7. Proof of the discrete version of Loomis–Whitney inequality. Let ( A , B , C ) be a random pixel in the body. Then H ( A , B , C ) = log 2 V H ( A , B ) ≤ log 2 S 1 H ( A , C ) ≤ log 2 S 2 H ( B , C ) ≤ log 2 S 3 Plug these values into Shearer’s inequality 2 · H ( A , B , C ) ≤ H ( A , B ) + H ( A , C ) + H ( B , C ) . 7 / 19

  8. Mutual information Definition (mutual information) I ( A : B ) = H ( A ) + H ( B ) − H ( A , B ) I ( A : B | C ) = H ( A | C ) + H ( B | C ) − H ( A , B | C ) . Theorem I ( A : B ) = 0 iff A , B are independent. I ( A : B | C ) = 0 iff A , B are independent conditional to C. 8 / 19

  9. Conditional inequalities (an example) Proposition The inequality I ( A : B | C ) ≤ I ( A : B ) is false for some A , B , C (let C = A ⊕ B ). Proposition However, I ( B : C | A ) = 0 ⇒ I ( A : B | C ) ≤ I ( A : B ) Moreover, I ( A : B | C ) ≤ I ( A : B ) + I ( B : C | A ) . for all A , B , C . 9 / 19

  10. Proof of the inequality I ( A : B | C ) ≤ I ( A : B ) + I ( B : C | A ) Add the inequalities H ( A , B ) = H ( A ) + H ( B | A ) H ( B , C | A ) = H ( C | A ) + H ( B | A , C ) H ( A | C ) + H ( B | A , C ) = H ( A , B | C ) H ( B | C ) ≤ H ( B ) 10 / 19

  11. Another evidence that mutual information is not material Theorem (folklore) H ( C ) ≤ H ( C | X ) + H ( C | Y ) + I ( X : Y ) for all C , X , Y . Theorem (Matuˇ s, Romashchenko) The inequality I ( A : B ) ≤ I ( A : B | X ) + I ( A : B | Y ) + I ( X : Y ) (Ingleton inequality) is false for some A , B , X , Y . 11 / 19

  12. A non Shannon-type conditional inequality Example ( Zhang and Yeung (1997)) I ( X : Y | A ) = I ( X : Y ) = 0 ⇒ I ( A : B ) ≤ I ( A : B | X ) + I ( A : B | Y ) + I ( X : Y ) . Remark This inequality is essentially conditional: the inequality I ( A : B ) ≤ I ( A : B | X ) + I ( A : B | Y ) + I ( X : Y )+ c · ( I ( X : Y ) + I ( X : Y | A )) is wrong in general for any constant c . 12 / 19

  13. A non Shannon-type unconditional inequality Theorem (Makarychev, Makarychev, Romashchenko, V.’ 2002) I ( A : B ) ≤ I ( A : B | X ) + I ( A : B | Y ) + I ( X : Y ) + I ( A : B | C ) + I ( B : C | A ) + I ( C : A | B ) 13 / 19

  14. Another non Shannon-type conditional inequality Example (Kaced and Romashchenko (2013)) I ( X : Y | A ) = H ( A | X , Y ) = 0 ⇒ I ( A : B ) ≤ I ( A : B | X ) + I ( A : B | Y ) + I ( X : Y ) . A reformulation: I ( X : Y | A ) = H ( A | X , Y ) = 0 ⇒ H ( A | X , B ) + H ( A | Y , B ) ≤ H ( A | B ) . This talk: we “demystify” Kaced and Romashchenko’s inequality and present its combinatorial application. 14 / 19

  15. Theorem The inequality H ( A | X , B ) + H ( A | Y , B ) ≤ H ( A | B ) holds true provided the supports of distribution of the pairs ( A , X ) and ( A , Y ) have the following property: P [ A = a , X = x ] > 0 , P [ A = a , Y = y ] > 0 , P [ A = a ′ , X = x ] > 0 , P [ A = a ′ , Y = y ] > 0 ⇒ a = a ′ X A Y x a y a ′ Remark 1. The condition here is weaker than that of Kaced and Romashchenko. 2. The condition here relativizes. 15 / 19

  16. Proof Step 1. The general case reduces to the case of trivial B : H ( A | X ) + H ( A | Y ) ≤ H ( A ) . Step 2. The general case reduces further to the case when X , Y are independent conditional to A . Proof. Define new random variables A ′ , X ′ , Y ′ so that: the marginal distributions of ( A ′ , X ′ ) and ( A ′ , Y ′ ) are the same as the marginal distributions of ( A , X ) and ( A , Y ), but X ′ , Y ′ are independent conditional to A ′ . Step 3. We prove the Shannon type inequality H ( A | X ) + H ( A | Y ) ≤ H ( A ) + H ( A | X , Y ) + I ( X : Y | A ) 16 / 19

  17. A combinatorial application of the inequality H ( A | X ) + H ( A | Y ) ≤ H ( A ) Theorem Assume that a finite family F of pair-wise disjoint squares is given, each square being a subset of [0 , 1] × [0 , 1] . Assume that each vertical line inside [0 , 1] × [0 , 1] intersects at least L squares in F and similarly each horizontal line intersects at least R squares in F. Then | F | ≥ LR. 17 / 19

  18. The proof of a discrete version of the theorem (each square consists of pixels) Let A = S · T be a randomly chosen square from F , where the probability of each square is proportional to its length | S | = | T | (not area!). Let ( X , Y ) be a random pair from A (chosen with the uniform distribution). The conditions of the inequality are fulfilled hence H ( A | X ) + H ( A | Y ) ≤ H ( A ) One can show that A | X and A | Y have uniform distributions, hence H ( A | X ) ≥ log R and H ( A | Y ) ≥ log L . It follows that log L + log R ≤ H ( A ) . As H ( A ) ≤ log | F | , the theorem follows. 18 / 19

  19. Thank you for attention! 19 / 19

Recommend


More recommend