kantorovich optimal transport problem and shannon s
play

Kantorovich optimal transport problem and Shannons optimal channel - PowerPoint PPT Presentation

Kantorovich optimal transport problem and Shannons optimal channel problem Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK 13 June 2016 In honor of Shun-ichi Amari Roman Belavkin (Middlesex


  1. � � � Information and entropy Theorem (Shannon-Pythagorean) w ∈  ( X ⊗ Y ) , 휋 X w = q , 휋 Y w = p w D ( w , q ⊗ q ) D ( w , q ⊗ p ) D KL ( w , q ⊗ q ) = D KL ( w , q ⊗ p ) + D KL ( p , q ) q ⊗ q D ( p , q ) q ⊗ p (Belavkin, 2013a) Proof. D ( w , q ⊗ q ) = D ( w , q ⊗ p ) + D ( q ⊗ p , q ⊗ q ) − ⟨ ln q ⊗ p − ln q ⊗ q , q ⊗ p − w ⟩ ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ I w { x , y } D ( p , q ) 0 ln q ⊗ p − ln q ⊗ q = 1 X ⊗ ( ln p − ln q ) Cross-Information (Belavkin, 2013a) D KL ( w , q ⊗ q ) = − ⟨ ln q , p ⟩ − [ H ( p ) − D KL ( w , q ⊗ p )] ⏟⏞⏞⏟⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ Cross-entropy H ( p ( y ∣ x )) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 9 / 30

  2. Optimal channel problem (OCP) Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 10 / 30

  3. Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

  4. Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

  5. Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 Exponential family solutions Optimal T ∶  ( X ) →  ( Y ) is defined by 훽 − 1 = − dS c ( q , 휆 )∕ d 휆 w = e − 훽 c − ln Z q ⊗ p , 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

  6. Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 Exponential family solutions Optimal T ∶  ( X ) →  ( Y ) is defined by 훽 − 1 = − dS c ( q , 휆 )∕ d 휆 w = e − 훽 c − ln Z q ⊗ p , 훿 21 Observe that w ∉ 휕  ( X ⊗ Y ) , unless 훿 22 q ⊗ p ∈ 휕  ( X ⊗ Y ) or 훽 → ∞ . 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

  7. Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 Exponential family solutions Optimal T ∶  ( X ) →  ( Y ) is defined by 훽 − 1 = − dS c ( q , 휆 )∕ d 휆 w = e − 훽 c − ln Z q ⊗ p , 훿 21 Observe that w ∉ 휕  ( X ⊗ Y ) , unless 훿 22 q ⊗ p ∈ 휕  ( X ⊗ Y ) or 훽 → ∞ . 훿 12 Value of Information (Stratonovich, 1965) V ( 휆 ) ∶= S c ( q , 0 ) − S c ( q , 휆 ) = sup { 피 w { u } ∶ I w { x , y } ≤ 휆 } Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

  8. Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

  9. Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Optimal Transportation Problem { } ∫ X × Y K c ( q , p ) ∶= inf c ( x , y ) dw ∶ 휋 X w = q , 휋 Y w = p Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

  10. Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Optimal Transportation Problem { } ∫ X × Y K c ( q , p ) ∶= inf c ( x , y ) dw ∶ 휋 X w = q , 휋 Y w = p q and p have entropies H ( q ) and H ( p ) and 0 ≤ I w { x , y } ≤ min [ H ( q ) , H ( p )] Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

  11. Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Optimal Transportation Problem { } ∫ X × Y K c ( q , p ) ∶= inf c ( x , y ) dw ∶ 휋 X w = q , 휋 Y w = p q and p have entropies H ( q ) and H ( p ) and 0 ≤ I w { x , y } ≤ min [ H ( q ) , H ( p )] K c ( q , p ) has implicit constraint I w { x , y } ≤ 휆 = min [ H ( q ) , H ( p )] and S c ( q , 휆 ) ≤ K c ( q , p ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

  12. Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

  13. Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Inverse of the OTP Value { } I w { x , y } ∶ 휋 X w = q , 휋 Y w = p , ∫ c dw ≤ 휐 K − 1 c ( q , p , 휐 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

  14. Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Inverse of the OTP Value { } I w { x , y } ∶ 휋 X w = q , 휋 Y w = p , ∫ c dw ≤ 휐 K − 1 c ( q , p , 휐 ) ∶= inf These inverse values represent the smallest amount of Shannon’s information required to achieve expected cost ∫ c dw = 휐 . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

  15. Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Inverse of the OTP Value { } I w { x , y } ∶ 휋 X w = q , 휋 Y w = p , ∫ c dw ≤ 휐 K − 1 c ( q , p , 휐 ) ∶= inf These inverse values represent the smallest amount of Shannon’s information required to achieve expected cost ∫ c dw = 휐 . If 휐 = K c ( q , p ) , then c ( q , 휐 ) ≤ K − 1 S − 1 c ( q , p , 휐 ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

  16. Geometry of information divergence and optimization Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 14 / 30

  17. Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 피 p { f } ≥ 휐 q 피 p { ln ( p ∕ q )} ≤ 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

  18. Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 q 피 p { ln ( p ∕ q )} ≤ 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

  19. Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of utility 휐 : 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q 피 p { ln ( p ∕ q )} ≤ 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

  20. Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of utility 휐 : p 훽 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q 피 p { ln ( p ∕ q )} ≤ 휆 p ( 훽 ) optimal solutions: p ( 훽 ) ∈ 휕 F ∗ ( 훽 u , q ) , F ( p ( 훽 ) , q ) = 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

  21. Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of utility 휐 : p 훽 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q 피 p { ln ( p ∕ q )} ≤ 휆 p ( 훽 ) optimal solutions: p ( 훽 ) ∈ 휕 F ∗ ( 훽 u , q ) , F ( p ( 훽 ) , q ) = 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

  22. Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of 피 p { w } ≥ 휐 utility 휐 : p 훽 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q p ( 훽 ) optimal solutions: ‖ p − q ‖ 1 ≤ 휆 p ( 훽 ) ∈ 휕 F ∗ ( 훽 u , q ) , F ( p ( 훽 ) , q ) = 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

  23. Geometry of information divergence and optimization General Solution Lagrange function for 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } ( 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } ): L ( p , 훽 − 1 ) ⟨ u , p ⟩ + 훽 − 1 [ 휆 − F ( p , q )] = ( ) L ( p , 훽 ) = F ( p , q ) + 훽 [ 휐 − ⟨ u , p ⟩ ] Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30

  24. Geometry of information divergence and optimization General Solution Lagrange function for 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } ( 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } ): L ( p , 훽 − 1 ) ⟨ u , p ⟩ + 훽 − 1 [ 휆 − F ( p , q )] = ( ) L ( p , 훽 ) = F ( p , q ) + 훽 [ 휐 − ⟨ u , p ⟩ ] Necessary and sufficient conditions 휕 L ∋ 0 : 휕 p L ( p , 훽 − 1 ) = { 훽 u } − 휕 p F ( p , q ) ∋ 0 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − F ( p , q ) = 0 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30

  25. Geometry of information divergence and optimization General Solution Lagrange function for 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } ( 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } ): L ( p , 훽 − 1 ) ⟨ u , p ⟩ + 훽 − 1 [ 휆 − F ( p , q )] = ( ) L ( p , 훽 ) = F ( p , q ) + 훽 [ 휐 − ⟨ u , p ⟩ ] Necessary and sufficient conditions 휕 L ∋ 0 : 휕 p L ( p , 훽 − 1 ) = { 훽 u } − 휕 p F ( p , q ) ∋ 0 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − F ( p , q ) = 0 Optimal solutions are subgradients of F ∗ ( u , q ) = sup { ⟨ u , p ⟩ − F ( p , q )} : ( ) p ( 훽 ) ∈ 휕 F ∗ ( 훽 u ) , ⟨ u , p ( 훽 ) ⟩ = 휐 F ( p , q ) = 휆 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30

  26. Geometry of information divergence and optimization Example: Exponential Solution For D KL ( p , q ) = ⟨ ln ( p ∕ q ) , p ⟩ − ⟨ 1 , p − q ⟩ : L ( p , 훽 − 1 ) = ⟨ u , p ⟩ + 훽 − 1 [ 휆 − ⟨ ln ( p ∕ q ) , p ⟩ + ⟨ 1 , p − q ⟩ ] Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30

  27. Geometry of information divergence and optimization Example: Exponential Solution For D KL ( p , q ) = ⟨ ln ( p ∕ q ) , p ⟩ − ⟨ 1 , p − q ⟩ : L ( p , 훽 − 1 ) = ⟨ u , p ⟩ + 훽 − 1 [ 휆 − ⟨ ln ( p ∕ q ) , p ⟩ + ⟨ 1 , p − q ⟩ ] Necessary and sufficient conditions ∇ L ( p , 훽 − 1 ) = 0 : u − 훽 − 1 ln ( p ∕ q ) = 0 ∇ p L ( p , 훽 − 1 ) = 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − D KL ( p , q ) = 0 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30

  28. Geometry of information divergence and optimization Example: Exponential Solution For D KL ( p , q ) = ⟨ ln ( p ∕ q ) , p ⟩ − ⟨ 1 , p − q ⟩ : L ( p , 훽 − 1 ) = ⟨ u , p ⟩ + 훽 − 1 [ 휆 − ⟨ ln ( p ∕ q ) , p ⟩ + ⟨ 1 , p − q ⟩ ] Necessary and sufficient conditions ∇ L ( p , 훽 − 1 ) = 0 : u − 훽 − 1 ln ( p ∕ q ) = 0 ∇ p L ( p , 훽 − 1 ) = 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − D KL ( p , q ) = 0 Optimal solutions are gradients of D ∗ KL ( u , q ) = ln ⟨ e u , q ⟩ : p ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q , D KL ( p ( 훽 ) , q ) = 휆 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30

  29. Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

  30. Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p w ∉ 휕  ( X ⊗ Y ) . 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

  31. Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p w ∉ 휕  ( X ⊗ Y ) . T ∶  ( X ) →  ( Y ) cannot have kernel 훿 f ( x ) ( ⋅ ) . 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

  32. Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p w ∉ 휕  ( X ⊗ Y ) . T ∶  ( X ) →  ( Y ) cannot have kernel 훿 f ( x ) ( ⋅ ) . 훿 21 훿 22 The dual is strictly convex: 훿 12 KL ( u , q ⊗ p ) = ln ∫ e u q ⊗ p D ∗ Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

  33. Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

  34. Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q There exists closed convex functional F : Γ( q , p ) = { w ∶ F ( w , q ⊗ p ) ≤ 1 } 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

  35. Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q There exists closed convex functional F : Γ( q , p ) = { w ∶ F ( w , q ⊗ p ) ≤ 1 } 훿 21 훿 22 Then the solution to OTP is: 훿 12 w ( 훽 ) ∈ 휕 F ∗ (− 훽 c , q ⊗ p ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

  36. Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q There exists closed convex functional F : Γ( q , p ) = { w ∶ F ( w , q ⊗ p ) ≤ 1 } 훿 21 훿 22 Then the solution to OTP is: 훿 12 w ( 훽 ) ∈ 휕 F ∗ (− 훽 c , q ⊗ p ) Monge-Amper equation q = p ◦ ∇ 휑 | ∇ 2 휑 | where 휑 ∶ X → ℝ ∪ {∞} is convex, and ∇ 휑 ∶ X → Y is such that p = q ◦ (∇ 휑 ) − 1 (McCann, 1995; Villani, 2009). Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

  37. Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

  38. Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶  ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int (  ( X ⊗ Y )) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

  39. Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶  ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int (  ( X ⊗ Y )) If F ∗ is strictly convex, then Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

  40. Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶  ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int (  ( X ⊗ Y )) If F ∗ is strictly convex, then w ( 훽 ) ∈ 휕  ( X ⊗ Y ) iff 휆 ≥ sup F (i.e. 훽 → ∞ ). 1 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

  41. Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶  ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int (  ( X ⊗ Y )) If F ∗ is strictly convex, then w ( 훽 ) ∈ 휕  ( X ⊗ Y ) iff 휆 ≥ sup F (i.e. 훽 → ∞ ). For any v ∈ 휕  ( X ⊗ Y ) with F ( v ) = F ( w ( 훽 )) = 휆 1 2 피 v { u } < 피 w ( 훽 ) { u } Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

  42. Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶  ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int (  ( X ⊗ Y )) If F ∗ is strictly convex, then w ( 훽 ) ∈ 휕  ( X ⊗ Y ) iff 휆 ≥ sup F (i.e. 훽 → ∞ ). For any v ∈ 휕  ( X ⊗ Y ) with F ( v ) = F ( w ( 훽 )) = 휆 1 2 피 v { u } < 피 w ( 훽 ) { u } For any v ∈ 휕  ( X ⊗ Y ) with 피 v { u } = 피 w ( 훽 ) { u } = 휐 3 F ( v ) > F ( w ( 훽 )) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

  43. Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

  44. Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Let w ( 훽 ) is a solution to Shannon’s OCP S c ( q , 휆 ) . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

  45. Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Let w ( 훽 ) is a solution to Shannon’s OCP S c ( q , 휆 ) . If w f and w ( 훽 ) have equal I w f { x , y } = I w ( 훽 ) { x , y } = 휆 < sup I w { x , y } , then K c ( q , p ) > S c ( q , 휆 ) > 0 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

  46. Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Let w ( 훽 ) is a solution to Shannon’s OCP S c ( q , 휆 ) . If w f and w ( 훽 ) have equal I w f { x , y } = I w ( 훽 ) { x , y } = 휆 < sup I w { x , y } , then K c ( q , p ) > S c ( q , 휆 ) > 0 If w f and w ( 훽 ) achieve equal values K c ( q , p ) = S c ( q , 휆 ) = 휐 > 0 , then K − 1 c ( q , p , 휐 ) > S − 1 c ( q , 휐 ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

  47. Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

  48. Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v , w ∈  ( X ⊗ Y ) : 피 v { u } ≤ 피 w { u } v ≲ w ∃ u ∶ X × Y → ℝ ⟺ Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

  49. Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v , w ∈  ( X ⊗ Y ) : 피 v { u } ≤ 피 w { u } v ≲ w ∃ u ∶ X × Y → ℝ ⟺ Based on linear and Archimedian axioms: (1) v ≲ w 휆 v ≲ 휆 w , ∀ 휆 > 0 ⟺ (2) v ≲ w v + r ≲ w + r , ∀ r ∈ L ⟺ (3) nv ≲ w , ∀ n ∈ ℕ v ≲ 0 ⇒ Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

  50. Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v , w ∈  ( X ⊗ Y ) : 피 v { u } ≤ 피 w { u } v ≲ w ∃ u ∶ X × Y → ℝ ⟺ Based on linear and Archimedian axioms: (1) v ≲ w 휆 v ≲ 휆 w , ∀ 휆 > 0 ⟺ (2) v ≲ w v + r ≲ w + r , ∀ r ∈ L ⟺ (3) nv ≲ w , ∀ n ∈ ℕ v ≲ 0 ⇒ Theorem Pre-ordered linear space ( L , ≲ ) satisfies (1), (2) and (3) if and only if ( L , ≲ ) has a utility representation by a closed linear functional u ∶ L → ℝ . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

  51. Dynamical OTP: Optimization of evolution Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 23 / 30

  52. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  53. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  54. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  55. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  56. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  57. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  58. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  59. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  60. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  61. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  62. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  63. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  64. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  65. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  66. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  67. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  68. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  69. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  70. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  71. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  72. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  73. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  74. Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Why not directly q ↦ Tq = 훿 ⊤ ? Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

  75. Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

  76. Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

  77. Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i | { 1 , … , 훼 } l | = 훼 l Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

  78. Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i | { 1 , … , 훼 } l | = 훼 l To find ⊤ in one step we need l log 2 훼 bits. Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

  79. Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i | { 1 , … , 훼 } l | = 훼 l To find ⊤ in one step we need l log 2 훼 bits. d H ( ⊤, x ) communicates no more than log 2 ( l + 1 ) bits. Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

Recommend


More recommend