� � � Information and entropy Theorem (Shannon-Pythagorean) w ∈ ( X ⊗ Y ) , 휋 X w = q , 휋 Y w = p w D ( w , q ⊗ q ) D ( w , q ⊗ p ) D KL ( w , q ⊗ q ) = D KL ( w , q ⊗ p ) + D KL ( p , q ) q ⊗ q D ( p , q ) q ⊗ p (Belavkin, 2013a) Proof. D ( w , q ⊗ q ) = D ( w , q ⊗ p ) + D ( q ⊗ p , q ⊗ q ) − ⟨ ln q ⊗ p − ln q ⊗ q , q ⊗ p − w ⟩ ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ I w { x , y } D ( p , q ) 0 ln q ⊗ p − ln q ⊗ q = 1 X ⊗ ( ln p − ln q ) Cross-Information (Belavkin, 2013a) D KL ( w , q ⊗ q ) = − ⟨ ln q , p ⟩ − [ H ( p ) − D KL ( w , q ⊗ p )] ⏟⏞⏞⏟⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ Cross-entropy H ( p ( y ∣ x )) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 9 / 30
Optimal channel problem (OCP) Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 10 / 30
Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30
Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30
Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 Exponential family solutions Optimal T ∶ ( X ) → ( Y ) is defined by 훽 − 1 = − dS c ( q , 휆 )∕ d 휆 w = e − 훽 c − ln Z q ⊗ p , 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30
Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 Exponential family solutions Optimal T ∶ ( X ) → ( Y ) is defined by 훽 − 1 = − dS c ( q , 휆 )∕ d 휆 w = e − 훽 c − ln Z q ⊗ p , 훿 21 Observe that w ∉ 휕 ( X ⊗ Y ) , unless 훿 22 q ⊗ p ∈ 휕 ( X ⊗ Y ) or 훽 → ∞ . 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30
Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 Exponential family solutions Optimal T ∶ ( X ) → ( Y ) is defined by 훽 − 1 = − dS c ( q , 휆 )∕ d 휆 w = e − 훽 c − ln Z q ⊗ p , 훿 21 Observe that w ∉ 휕 ( X ⊗ Y ) , unless 훿 22 q ⊗ p ∈ 휕 ( X ⊗ Y ) or 훽 → ∞ . 훿 12 Value of Information (Stratonovich, 1965) V ( 휆 ) ∶= S c ( q , 0 ) − S c ( q , 휆 ) = sup { 피 w { u } ∶ I w { x , y } ≤ 휆 } Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30
Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30
Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Optimal Transportation Problem { } ∫ X × Y K c ( q , p ) ∶= inf c ( x , y ) dw ∶ 휋 X w = q , 휋 Y w = p Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30
Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Optimal Transportation Problem { } ∫ X × Y K c ( q , p ) ∶= inf c ( x , y ) dw ∶ 휋 X w = q , 휋 Y w = p q and p have entropies H ( q ) and H ( p ) and 0 ≤ I w { x , y } ≤ min [ H ( q ) , H ( p )] Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30
Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Optimal Transportation Problem { } ∫ X × Y K c ( q , p ) ∶= inf c ( x , y ) dw ∶ 휋 X w = q , 휋 Y w = p q and p have entropies H ( q ) and H ( p ) and 0 ≤ I w { x , y } ≤ min [ H ( q ) , H ( p )] K c ( q , p ) has implicit constraint I w { x , y } ≤ 휆 = min [ H ( q ) , H ( p )] and S c ( q , 휆 ) ≤ K c ( q , p ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30
Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30
Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Inverse of the OTP Value { } I w { x , y } ∶ 휋 X w = q , 휋 Y w = p , ∫ c dw ≤ 휐 K − 1 c ( q , p , 휐 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30
Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Inverse of the OTP Value { } I w { x , y } ∶ 휋 X w = q , 휋 Y w = p , ∫ c dw ≤ 휐 K − 1 c ( q , p , 휐 ) ∶= inf These inverse values represent the smallest amount of Shannon’s information required to achieve expected cost ∫ c dw = 휐 . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30
Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Inverse of the OTP Value { } I w { x , y } ∶ 휋 X w = q , 휋 Y w = p , ∫ c dw ≤ 휐 K − 1 c ( q , p , 휐 ) ∶= inf These inverse values represent the smallest amount of Shannon’s information required to achieve expected cost ∫ c dw = 휐 . If 휐 = K c ( q , p ) , then c ( q , 휐 ) ≤ K − 1 S − 1 c ( q , p , 휐 ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30
Geometry of information divergence and optimization Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 14 / 30
Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 피 p { f } ≥ 휐 q 피 p { ln ( p ∕ q )} ≤ 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30
Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 q 피 p { ln ( p ∕ q )} ≤ 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30
Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of utility 휐 : 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q 피 p { ln ( p ∕ q )} ≤ 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30
Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of utility 휐 : p 훽 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q 피 p { ln ( p ∕ q )} ≤ 휆 p ( 훽 ) optimal solutions: p ( 훽 ) ∈ 휕 F ∗ ( 훽 u , q ) , F ( p ( 훽 ) , q ) = 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30
Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of utility 휐 : p 훽 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q 피 p { ln ( p ∕ q )} ≤ 휆 p ( 훽 ) optimal solutions: p ( 훽 ) ∈ 휕 F ∗ ( 훽 u , q ) , F ( p ( 훽 ) , q ) = 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30
Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of 피 p { w } ≥ 휐 utility 휐 : p 훽 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q p ( 훽 ) optimal solutions: ‖ p − q ‖ 1 ≤ 휆 p ( 훽 ) ∈ 휕 F ∗ ( 훽 u , q ) , F ( p ( 훽 ) , q ) = 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30
Geometry of information divergence and optimization General Solution Lagrange function for 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } ( 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } ): L ( p , 훽 − 1 ) ⟨ u , p ⟩ + 훽 − 1 [ 휆 − F ( p , q )] = ( ) L ( p , 훽 ) = F ( p , q ) + 훽 [ 휐 − ⟨ u , p ⟩ ] Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30
Geometry of information divergence and optimization General Solution Lagrange function for 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } ( 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } ): L ( p , 훽 − 1 ) ⟨ u , p ⟩ + 훽 − 1 [ 휆 − F ( p , q )] = ( ) L ( p , 훽 ) = F ( p , q ) + 훽 [ 휐 − ⟨ u , p ⟩ ] Necessary and sufficient conditions 휕 L ∋ 0 : 휕 p L ( p , 훽 − 1 ) = { 훽 u } − 휕 p F ( p , q ) ∋ 0 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − F ( p , q ) = 0 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30
Geometry of information divergence and optimization General Solution Lagrange function for 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } ( 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } ): L ( p , 훽 − 1 ) ⟨ u , p ⟩ + 훽 − 1 [ 휆 − F ( p , q )] = ( ) L ( p , 훽 ) = F ( p , q ) + 훽 [ 휐 − ⟨ u , p ⟩ ] Necessary and sufficient conditions 휕 L ∋ 0 : 휕 p L ( p , 훽 − 1 ) = { 훽 u } − 휕 p F ( p , q ) ∋ 0 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − F ( p , q ) = 0 Optimal solutions are subgradients of F ∗ ( u , q ) = sup { ⟨ u , p ⟩ − F ( p , q )} : ( ) p ( 훽 ) ∈ 휕 F ∗ ( 훽 u ) , ⟨ u , p ( 훽 ) ⟩ = 휐 F ( p , q ) = 휆 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30
Geometry of information divergence and optimization Example: Exponential Solution For D KL ( p , q ) = ⟨ ln ( p ∕ q ) , p ⟩ − ⟨ 1 , p − q ⟩ : L ( p , 훽 − 1 ) = ⟨ u , p ⟩ + 훽 − 1 [ 휆 − ⟨ ln ( p ∕ q ) , p ⟩ + ⟨ 1 , p − q ⟩ ] Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30
Geometry of information divergence and optimization Example: Exponential Solution For D KL ( p , q ) = ⟨ ln ( p ∕ q ) , p ⟩ − ⟨ 1 , p − q ⟩ : L ( p , 훽 − 1 ) = ⟨ u , p ⟩ + 훽 − 1 [ 휆 − ⟨ ln ( p ∕ q ) , p ⟩ + ⟨ 1 , p − q ⟩ ] Necessary and sufficient conditions ∇ L ( p , 훽 − 1 ) = 0 : u − 훽 − 1 ln ( p ∕ q ) = 0 ∇ p L ( p , 훽 − 1 ) = 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − D KL ( p , q ) = 0 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30
Geometry of information divergence and optimization Example: Exponential Solution For D KL ( p , q ) = ⟨ ln ( p ∕ q ) , p ⟩ − ⟨ 1 , p − q ⟩ : L ( p , 훽 − 1 ) = ⟨ u , p ⟩ + 훽 − 1 [ 휆 − ⟨ ln ( p ∕ q ) , p ⟩ + ⟨ 1 , p − q ⟩ ] Necessary and sufficient conditions ∇ L ( p , 훽 − 1 ) = 0 : u − 훽 − 1 ln ( p ∕ q ) = 0 ∇ p L ( p , 훽 − 1 ) = 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − D KL ( p , q ) = 0 Optimal solutions are gradients of D ∗ KL ( u , q ) = ln ⟨ e u , q ⟩ : p ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q , D KL ( p ( 훽 ) , q ) = 휆 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30
Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30
Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p w ∉ 휕 ( X ⊗ Y ) . 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30
Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p w ∉ 휕 ( X ⊗ Y ) . T ∶ ( X ) → ( Y ) cannot have kernel 훿 f ( x ) ( ⋅ ) . 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30
Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p w ∉ 휕 ( X ⊗ Y ) . T ∶ ( X ) → ( Y ) cannot have kernel 훿 f ( x ) ( ⋅ ) . 훿 21 훿 22 The dual is strictly convex: 훿 12 KL ( u , q ⊗ p ) = ln ∫ e u q ⊗ p D ∗ Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30
Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30
Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q There exists closed convex functional F : Γ( q , p ) = { w ∶ F ( w , q ⊗ p ) ≤ 1 } 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30
Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q There exists closed convex functional F : Γ( q , p ) = { w ∶ F ( w , q ⊗ p ) ≤ 1 } 훿 21 훿 22 Then the solution to OTP is: 훿 12 w ( 훽 ) ∈ 휕 F ∗ (− 훽 c , q ⊗ p ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30
Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q There exists closed convex functional F : Γ( q , p ) = { w ∶ F ( w , q ⊗ p ) ≤ 1 } 훿 21 훿 22 Then the solution to OTP is: 훿 12 w ( 훽 ) ∈ 휕 F ∗ (− 훽 c , q ⊗ p ) Monge-Amper equation q = p ◦ ∇ 휑 | ∇ 2 휑 | where 휑 ∶ X → ℝ ∪ {∞} is convex, and ∇ 휑 ∶ X → Y is such that p = q ◦ (∇ 휑 ) − 1 (McCann, 1995; Villani, 2009). Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30
Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈ ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30
Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈ ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶ ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int ( ( X ⊗ Y )) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30
Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈ ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶ ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int ( ( X ⊗ Y )) If F ∗ is strictly convex, then Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30
Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈ ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶ ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int ( ( X ⊗ Y )) If F ∗ is strictly convex, then w ( 훽 ) ∈ 휕 ( X ⊗ Y ) iff 휆 ≥ sup F (i.e. 훽 → ∞ ). 1 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30
Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈ ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶ ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int ( ( X ⊗ Y )) If F ∗ is strictly convex, then w ( 훽 ) ∈ 휕 ( X ⊗ Y ) iff 휆 ≥ sup F (i.e. 훽 → ∞ ). For any v ∈ 휕 ( X ⊗ Y ) with F ( v ) = F ( w ( 훽 )) = 휆 1 2 피 v { u } < 피 w ( 훽 ) { u } Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30
Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈ ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶ ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int ( ( X ⊗ Y )) If F ∗ is strictly convex, then w ( 훽 ) ∈ 휕 ( X ⊗ Y ) iff 휆 ≥ sup F (i.e. 훽 → ∞ ). For any v ∈ 휕 ( X ⊗ Y ) with F ( v ) = F ( w ( 훽 )) = 휆 1 2 피 v { u } < 피 w ( 훽 ) { u } For any v ∈ 휕 ( X ⊗ Y ) with 피 v { u } = 피 w ( 훽 ) { u } = 휐 3 F ( v ) > F ( w ( 훽 )) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30
Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30
Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Let w ( 훽 ) is a solution to Shannon’s OCP S c ( q , 휆 ) . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30
Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Let w ( 훽 ) is a solution to Shannon’s OCP S c ( q , 휆 ) . If w f and w ( 훽 ) have equal I w f { x , y } = I w ( 훽 ) { x , y } = 휆 < sup I w { x , y } , then K c ( q , p ) > S c ( q , 휆 ) > 0 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30
Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Let w ( 훽 ) is a solution to Shannon’s OCP S c ( q , 휆 ) . If w f and w ( 훽 ) have equal I w f { x , y } = I w ( 훽 ) { x , y } = 휆 < sup I w { x , y } , then K c ( q , p ) > S c ( q , 휆 ) > 0 If w f and w ( 훽 ) achieve equal values K c ( q , p ) = S c ( q , 휆 ) = 휐 > 0 , then K − 1 c ( q , p , 휐 ) > S − 1 c ( q , 휐 ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30
Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30
Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v , w ∈ ( X ⊗ Y ) : 피 v { u } ≤ 피 w { u } v ≲ w ∃ u ∶ X × Y → ℝ ⟺ Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30
Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v , w ∈ ( X ⊗ Y ) : 피 v { u } ≤ 피 w { u } v ≲ w ∃ u ∶ X × Y → ℝ ⟺ Based on linear and Archimedian axioms: (1) v ≲ w 휆 v ≲ 휆 w , ∀ 휆 > 0 ⟺ (2) v ≲ w v + r ≲ w + r , ∀ r ∈ L ⟺ (3) nv ≲ w , ∀ n ∈ ℕ v ≲ 0 ⇒ Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30
Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v , w ∈ ( X ⊗ Y ) : 피 v { u } ≤ 피 w { u } v ≲ w ∃ u ∶ X × Y → ℝ ⟺ Based on linear and Archimedian axioms: (1) v ≲ w 휆 v ≲ 휆 w , ∀ 휆 > 0 ⟺ (2) v ≲ w v + r ≲ w + r , ∀ r ∈ L ⟺ (3) nv ≲ w , ∀ n ∈ ℕ v ≲ 0 ⇒ Theorem Pre-ordered linear space ( L , ≲ ) satisfies (1), (2) and (3) if and only if ( L , ≲ ) has a utility representation by a closed linear functional u ∶ L → ℝ . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30
Dynamical OTP: Optimization of evolution Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 23 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Why not directly q ↦ Tq = 훿 ⊤ ? Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30
Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30
Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30
Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i | { 1 , … , 훼 } l | = 훼 l Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30
Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i | { 1 , … , 훼 } l | = 훼 l To find ⊤ in one step we need l log 2 훼 bits. Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30
Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i | { 1 , … , 훼 } l | = 훼 l To find ⊤ in one step we need l log 2 훼 bits. d H ( ⊤, x ) communicates no more than log 2 ( l + 1 ) bits. Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30
Recommend
More recommend