Kantorovich optimal transport problem and Shannons optimal channel - PowerPoint PPT Presentation

� � � Information and entropy Theorem (Shannon-Pythagorean) w ∈  ( X ⊗ Y ) , 휋 X w = q , 휋 Y w = p w D ( w , q ⊗ q ) D ( w , q ⊗ p ) D KL ( w , q ⊗ q ) = D KL ( w , q ⊗ p ) + D KL ( p , q ) q ⊗ q D ( p , q ) q ⊗ p (Belavkin, 2013a) Proof. D ( w , q ⊗ q ) = D ( w , q ⊗ p ) + D ( q ⊗ p , q ⊗ q ) − ⟨ ln q ⊗ p − ln q ⊗ q , q ⊗ p − w ⟩ ⏟⏞⏞⏞⏞⏟⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ I w { x , y } D ( p , q ) 0 ln q ⊗ p − ln q ⊗ q = 1 X ⊗ ( ln p − ln q ) Cross-Information (Belavkin, 2013a) D KL ( w , q ⊗ q ) = − ⟨ ln q , p ⟩ − [ H ( p ) − D KL ( w , q ⊗ p )] ⏟⏞⏞⏟⏞⏞⏟ ⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏞⏟ Cross-entropy H ( p ( y ∣ x )) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 9 / 30

Optimal channel problem (OCP) Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 10 / 30

Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 Exponential family solutions Optimal T ∶  ( X ) →  ( Y ) is defined by 훽 − 1 = − dS c ( q , 휆 )∕ d 휆 w = e − 훽 c − ln Z q ⊗ p , 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 Exponential family solutions Optimal T ∶  ( X ) →  ( Y ) is defined by 훽 − 1 = − dS c ( q , 휆 )∕ d 휆 w = e − 훽 c − ln Z q ⊗ p , 훿 21 Observe that w ∉ 휕  ( X ⊗ Y ) , unless 훿 22 q ⊗ p ∈ 휕  ( X ⊗ Y ) or 훽 → ∞ . 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

Optimal channel problem (OCP) Shannon’s OCP Optimal Channel Problem (Shannon, 1948) { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf 훿 11 Exponential family solutions Optimal T ∶  ( X ) →  ( Y ) is defined by 훽 − 1 = − dS c ( q , 휆 )∕ d 휆 w = e − 훽 c − ln Z q ⊗ p , 훿 21 Observe that w ∉ 휕  ( X ⊗ Y ) , unless 훿 22 q ⊗ p ∈ 휕  ( X ⊗ Y ) or 훽 → ∞ . 훿 12 Value of Information (Stratonovich, 1965) V ( 휆 ) ∶= S c ( q , 0 ) − S c ( q , 휆 ) = sup { 피 w { u } ∶ I w { x , y } ≤ 휆 } Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 11 / 30

Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Optimal Transportation Problem { } ∫ X × Y K c ( q , p ) ∶= inf c ( x , y ) dw ∶ 휋 X w = q , 휋 Y w = p Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Optimal Transportation Problem { } ∫ X × Y K c ( q , p ) ∶= inf c ( x , y ) dw ∶ 휋 X w = q , 휋 Y w = p q and p have entropies H ( q ) and H ( p ) and 0 ≤ I w { x , y } ≤ min [ H ( q ) , H ( p )] Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

Optimal channel problem (OCP) Relation to Kantorovich OTP Optimal Channel Problem { } c ( x , y ) dw ∶ 휋 X w = q , I w { x , y } ≤ 휆 ∫ X × Y S c ( q , 휆 ) ∶= inf Optimal Transportation Problem { } ∫ X × Y K c ( q , p ) ∶= inf c ( x , y ) dw ∶ 휋 X w = q , 휋 Y w = p q and p have entropies H ( q ) and H ( p ) and 0 ≤ I w { x , y } ≤ min [ H ( q ) , H ( p )] K c ( q , p ) has implicit constraint I w { x , y } ≤ 휆 = min [ H ( q ) , H ( p )] and S c ( q , 휆 ) ≤ K c ( q , p ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 12 / 30

Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Inverse of the OTP Value { } I w { x , y } ∶ 휋 X w = q , 휋 Y w = p , ∫ c dw ≤ 휐 K − 1 c ( q , p , 휐 ) ∶= inf Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Inverse of the OTP Value { } I w { x , y } ∶ 휋 X w = q , 휋 Y w = p , ∫ c dw ≤ 휐 K − 1 c ( q , p , 휐 ) ∶= inf These inverse values represent the smallest amount of Shannon’s information required to achieve expected cost ∫ c dw = 휐 . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

Optimal channel problem (OCP) Inverse Optimal Values Inverse of the OCP Value { } I w { x , y } ∶ 휋 X w = q , ∫ c dw ≤ 휐 S − 1 c ( q , 휐 ) ∶= inf Inverse of the OTP Value { } I w { x , y } ∶ 휋 X w = q , 휋 Y w = p , ∫ c dw ≤ 휐 K − 1 c ( q , p , 휐 ) ∶= inf These inverse values represent the smallest amount of Shannon’s information required to achieve expected cost ∫ c dw = 휐 . If 휐 = K c ( q , p ) , then c ( q , 휐 ) ≤ K − 1 S − 1 c ( q , p , 휐 ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 13 / 30

Geometry of information divergence and optimization Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 14 / 30

Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 피 p { f } ≥ 휐 q 피 p { ln ( p ∕ q )} ≤ 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 q 피 p { ln ( p ∕ q )} ≤ 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of utility 휐 : 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q 피 p { ln ( p ∕ q )} ≤ 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of utility 휐 : p 훽 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q 피 p { ln ( p ∕ q )} ≤ 휆 p ( 훽 ) optimal solutions: p ( 훽 ) ∈ 휕 F ∗ ( 훽 u , q ) , F ( p ( 훽 ) , q ) = 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

Geometry of information divergence and optimization Problems on Conditional Extremum 피 p { u } = ⟨ u , p ⟩ expected utility 휔 3 휐 u ( 휆 ) = − S − u ( q , 휆 ) utility of information 휆 : 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } 피 p { f } ≥ 휐 휆 u ( 휐 ) = 휐 − 1 u ( 휐 ) information of 피 p { w } ≥ 휐 utility 휐 : p 훽 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } q p ( 훽 ) optimal solutions: ‖ p − q ‖ 1 ≤ 휆 p ( 훽 ) ∈ 휕 F ∗ ( 훽 u , q ) , F ( p ( 훽 ) , q ) = 휆 휔 2 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 15 / 30

Geometry of information divergence and optimization General Solution Lagrange function for 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } ( 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } ): L ( p , 훽 − 1 ) ⟨ u , p ⟩ + 훽 − 1 [ 휆 − F ( p , q )] = ( ) L ( p , 훽 ) = F ( p , q ) + 훽 [ 휐 − ⟨ u , p ⟩ ] Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30

Geometry of information divergence and optimization General Solution Lagrange function for 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } ( 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } ): L ( p , 훽 − 1 ) ⟨ u , p ⟩ + 훽 − 1 [ 휆 − F ( p , q )] = ( ) L ( p , 훽 ) = F ( p , q ) + 훽 [ 휐 − ⟨ u , p ⟩ ] Necessary and sufficient conditions 휕 L ∋ 0 : 휕 p L ( p , 훽 − 1 ) = { 훽 u } − 휕 p F ( p , q ) ∋ 0 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − F ( p , q ) = 0 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30

Geometry of information divergence and optimization General Solution Lagrange function for 휐 u ( 휆 ) ∶= sup { ⟨ u , p ⟩ ∶ F ( p , q ) ≤ 휆 } ( 휆 u ( 휐 ) ∶= inf { F ( p , q ) ∶ ⟨ u , p ⟩ ≥ 휐 } ): L ( p , 훽 − 1 ) ⟨ u , p ⟩ + 훽 − 1 [ 휆 − F ( p , q )] = ( ) L ( p , 훽 ) = F ( p , q ) + 훽 [ 휐 − ⟨ u , p ⟩ ] Necessary and sufficient conditions 휕 L ∋ 0 : 휕 p L ( p , 훽 − 1 ) = { 훽 u } − 휕 p F ( p , q ) ∋ 0 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − F ( p , q ) = 0 Optimal solutions are subgradients of F ∗ ( u , q ) = sup { ⟨ u , p ⟩ − F ( p , q )} : ( ) p ( 훽 ) ∈ 휕 F ∗ ( 훽 u ) , ⟨ u , p ( 훽 ) ⟩ = 휐 F ( p , q ) = 휆 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 16 / 30

Geometry of information divergence and optimization Example: Exponential Solution For D KL ( p , q ) = ⟨ ln ( p ∕ q ) , p ⟩ − ⟨ 1 , p − q ⟩ : L ( p , 훽 − 1 ) = ⟨ u , p ⟩ + 훽 − 1 [ 휆 − ⟨ ln ( p ∕ q ) , p ⟩ + ⟨ 1 , p − q ⟩ ] Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30

Geometry of information divergence and optimization Example: Exponential Solution For D KL ( p , q ) = ⟨ ln ( p ∕ q ) , p ⟩ − ⟨ 1 , p − q ⟩ : L ( p , 훽 − 1 ) = ⟨ u , p ⟩ + 훽 − 1 [ 휆 − ⟨ ln ( p ∕ q ) , p ⟩ + ⟨ 1 , p − q ⟩ ] Necessary and sufficient conditions ∇ L ( p , 훽 − 1 ) = 0 : u − 훽 − 1 ln ( p ∕ q ) = 0 ∇ p L ( p , 훽 − 1 ) = 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − D KL ( p , q ) = 0 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30

Geometry of information divergence and optimization Example: Exponential Solution For D KL ( p , q ) = ⟨ ln ( p ∕ q ) , p ⟩ − ⟨ 1 , p − q ⟩ : L ( p , 훽 − 1 ) = ⟨ u , p ⟩ + 훽 − 1 [ 휆 − ⟨ ln ( p ∕ q ) , p ⟩ + ⟨ 1 , p − q ⟩ ] Necessary and sufficient conditions ∇ L ( p , 훽 − 1 ) = 0 : u − 훽 − 1 ln ( p ∕ q ) = 0 ∇ p L ( p , 훽 − 1 ) = 휕 훽 − 1 L ( p , 훽 − 1 ) = 휆 − D KL ( p , q ) = 0 Optimal solutions are gradients of D ∗ KL ( u , q ) = ln ⟨ e u , q ⟩ : p ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q , D KL ( p ( 훽 ) , q ) = 휆 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 17 / 30

Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p w ∉ 휕  ( X ⊗ Y ) . 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p w ∉ 휕  ( X ⊗ Y ) . T ∶  ( X ) →  ( Y ) cannot have kernel 훿 f ( x ) ( ⋅ ) . 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

Geometry of information divergence and optimization Solution to Shannon’s OCP The solution for I w { x , y } = D KL ( w , q ⊗ p ) ≤ 휆 : 훿 11 w ( 훽 ) = e 훽 u − ln Z ( 훽 u ) q ⊗ p w ∉ 휕  ( X ⊗ Y ) . T ∶  ( X ) →  ( Y ) cannot have kernel 훿 f ( x ) ( ⋅ ) . 훿 21 훿 22 The dual is strictly convex: 훿 12 KL ( u , q ⊗ p ) = ln ∫ e u q ⊗ p D ∗ Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 18 / 30

Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q There exists closed convex functional F : Γ( q , p ) = { w ∶ F ( w , q ⊗ p ) ≤ 1 } 훿 21 훿 22 훿 12 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q There exists closed convex functional F : Γ( q , p ) = { w ∶ F ( w , q ⊗ p ) ≤ 1 } 훿 21 훿 22 Then the solution to OTP is: 훿 12 w ( 훽 ) ∈ 휕 F ∗ (− 훽 c , q ⊗ p ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

Geometry of information divergence and optimization Solution to Kantorovich’s OTP Γ( q , p ) is convex: 훿 11 휋 X (( 1 − t ) w 1 + tw 2 ) = ( 1 − t ) q + tq = q There exists closed convex functional F : Γ( q , p ) = { w ∶ F ( w , q ⊗ p ) ≤ 1 } 훿 21 훿 22 Then the solution to OTP is: 훿 12 w ( 훽 ) ∈ 휕 F ∗ (− 훽 c , q ⊗ p ) Monge-Amper equation q = p ◦ ∇ 휑 | ∇ 2 휑 | where 휑 ∶ X → ℝ ∪ {∞} is convex, and ∇ 휑 ∶ X → Y is such that p = q ◦ (∇ 휑 ) − 1 (McCann, 1995; Villani, 2009). Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 19 / 30

Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶  ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int (  ( X ⊗ Y )) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶  ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int (  ( X ⊗ Y )) If F ∗ is strictly convex, then Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶  ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int (  ( X ⊗ Y )) If F ∗ is strictly convex, then w ( 훽 ) ∈ 휕  ( X ⊗ Y ) iff 휆 ≥ sup F (i.e. 훽 → ∞ ). 1 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶  ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int (  ( X ⊗ Y )) If F ∗ is strictly convex, then w ( 훽 ) ∈ 휕  ( X ⊗ Y ) iff 휆 ≥ sup F (i.e. 훽 → ∞ ). For any v ∈ 휕  ( X ⊗ Y ) with F ( v ) = F ( w ( 훽 )) = 휆 1 2 피 v { u } < 피 w ( 훽 ) { u } Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

Geometry of information divergence and optimization Strict Inequalities Theorem (Belavkin, 2013b) Let { w ( 훽 )} u be a family of w ( 훽 ) ∈  ( X ⊗ Y ) maximizing 피 w { u } on sets { w ∶ F ( w ) ≤ 휆 } , ∀ 휆 = F ( w ) F ∶  ( X ⊗ Y ) → ℝ ∪ {∞} closed convex and minimized at q ⊗ p ∈ 휕 F ∗ ( 0 ) ⊂ Int (  ( X ⊗ Y )) If F ∗ is strictly convex, then w ( 훽 ) ∈ 휕  ( X ⊗ Y ) iff 휆 ≥ sup F (i.e. 훽 → ∞ ). For any v ∈ 휕  ( X ⊗ Y ) with F ( v ) = F ( w ( 훽 )) = 휆 1 2 피 v { u } < 피 w ( 훽 ) { u } For any v ∈ 휕  ( X ⊗ Y ) with 피 v { u } = 피 w ( 훽 ) { u } = 휐 3 F ( v ) > F ( w ( 훽 )) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 20 / 30

Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Let w ( 훽 ) is a solution to Shannon’s OCP S c ( q , 휆 ) . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Let w ( 훽 ) is a solution to Shannon’s OCP S c ( q , 휆 ) . If w f and w ( 훽 ) have equal I w f { x , y } = I w ( 훽 ) { x , y } = 휆 < sup I w { x , y } , then K c ( q , p ) > S c ( q , 휆 ) > 0 Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

Geometry of information divergence and optimization Strict Bounds for Monge OTP Corollary Let w f ∈ Γ( q , p ) be a solution to Monge OTP K c ( q , p ) . Let w ( 훽 ) is a solution to Shannon’s OCP S c ( q , 휆 ) . If w f and w ( 훽 ) have equal I w f { x , y } = I w ( 훽 ) { x , y } = 휆 < sup I w { x , y } , then K c ( q , p ) > S c ( q , 휆 ) > 0 If w f and w ( 훽 ) achieve equal values K c ( q , p ) = S c ( q , 휆 ) = 휐 > 0 , then K − 1 c ( q , p , 휐 ) > S − 1 c ( q , 휐 ) Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 21 / 30

Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v , w ∈  ( X ⊗ Y ) : 피 v { u } ≤ 피 w { u } v ≲ w ∃ u ∶ X × Y → ℝ ⟺ Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v , w ∈  ( X ⊗ Y ) : 피 v { u } ≤ 피 w { u } v ≲ w ∃ u ∶ X × Y → ℝ ⟺ Based on linear and Archimedian axioms: (1) v ≲ w 휆 v ≲ 휆 w , ∀ 휆 > 0 ⟺ (2) v ≲ w v + r ≲ w + r , ∀ r ∈ L ⟺ (3) nv ≲ w , ∀ n ∈ ℕ v ≲ 0 ⇒ Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

Geometry of information divergence and optimization Optimal Transport and the Expected Utility Principle Let ( X × Y , ≲ ) be a set with preference relation (total pre-order). The Neumann and Morgenstern (1944) EU principle states that for any v , w ∈  ( X ⊗ Y ) : 피 v { u } ≤ 피 w { u } v ≲ w ∃ u ∶ X × Y → ℝ ⟺ Based on linear and Archimedian axioms: (1) v ≲ w 휆 v ≲ 휆 w , ∀ 휆 > 0 ⟺ (2) v ≲ w v + r ≲ w + r , ∀ r ∈ L ⟺ (3) nv ≲ w , ∀ n ∈ ℕ v ≲ 0 ⇒ Theorem Pre-ordered linear space ( L , ≲ ) satisfies (1), (2) and (3) if and only if ( L , ≲ ) has a utility representation by a closed linear functional u ∶ L → ℝ . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 22 / 30

Dynamical OTP: Optimization of evolution Optimal transportation problems (OTPs) Information and entropy Optimal channel problem (OCP) Geometry of information divergence and optimization Dynamical OTP: Optimization of evolution Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 23 / 30

Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

Dynamical OTP: Optimization of evolution Dynamical Problems q ( t ) ↦ Tq ( t ) = q ( t + 1 ) q ( t ) ↦ T s q ( t ) = q ( t + s ) q ( t ) → 훿 ⊤ Expected utility: 피 q ( t ) { u } = ∫ X u dq ( t ) q Why not directly q ↦ Tq = 훿 ⊤ ? Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 24 / 30

Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i | { 1 , … , 훼 } l | = 훼 l Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i | { 1 , … , 훼 } l | = 훼 l To find ⊤ in one step we need l log 2 훼 bits. Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

Dynamical OTP: Optimization of evolution Example: Search in Hamming space { 1 , … , 훼 } l Find ⊤ ∈ { 1 , … , 훼 } l . Hamming metric d H ( x , y ) = ‖ y − x ‖ H = l − ∑ l i = 1 훿 x i y i | { 1 , … , 훼 } l | = 훼 l To find ⊤ in one step we need l log 2 훼 bits. d H ( ⊤, x ) communicates no more than log 2 ( l + 1 ) bits. Roman Belavkin (Middlesex University) Kantorovich’s and Shannon’s optimization problems 13 June 2016 25 / 30

Kantorovich optimal transport problem and Shannons optimal channel - PowerPoint PPT Presentation

Kantorovich optimal transport problem and Shannons optimal channel problem Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK 13 June 2016 In honor of Shun-ichi Amari Roman Belavkin (Middlesex

KANTOROVICH AND CLOSE-COUPLING METHODS IN QUANTUM TUNNELING PROBLEM FOR A COUPLED PAIR OF IONS

PDEs from Monge-Kantorovich Mass Transportation Theory Luca Petrelli Math & Computer Science

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Mid Shannon Wilderness Park The potential future of the Longford bogs Mid Shannon Potential 22

MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle

Welding Torch Problem Nyquist-Shannon Sampling Theorem Welding Torch Problem Model Solution

Welding Torch Problem Nyquist-Shannon Sampling Theorem Welding Torch Problem Model Solution

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

A Monge-Kantorovich approach to multivariate quantile regression Guillaume Carlier a . Joint work

A Kantorovich Monad for Ordered Spaces Paolo Perrone Tobias Fritz Max Planck Institute for

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Joint Local Transport Plan for West of England Bristol Transport Strategy The emerging transport

GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and

The Visitor Design Pattern EECS3311 A: Software Design Fall 2018 C HEN -W EI W ANG Motivating

The Interconnect Verification Challenge Franois Cerisier and Mike Bartley Test and

Open Compute Project: An Overview Who Are Involved? What Are The Goals? What Already Happened?

CasADi Joel Andersson Moritz Diehl Department of Electrical Engineering (ESAT-SCD) &

Datacenter Computing @Microsoft David Levinthal Types of f data center computing are diverse

PACE OF DEVELOPMENT Council Workshop Council Workshop Council Workshop Council Workshop

Denman Farm Plan Moonset 11:51 am Moonrise 11:45 pm Implementation Project August 2, 2018

Rigidity Rigidity Symptoms of Poor Design (revisited) 1. Rigidity 1. Rigidity The design

Kantorovich optimal transport problem and Shannons optimal channel - PowerPoint PPT Presentation

Kantorovich optimal transport problem and Shannons optimal channel problem Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK 13 June 2016 In honor of Shun-ichi Amari Roman Belavkin (Middlesex

KANTOROVICH AND CLOSE-COUPLING METHODS IN QUANTUM TUNNELING PROBLEM FOR A COUPLED PAIR OF IONS

PDEs from Monge-Kantorovich Mass Transportation Theory Luca Petrelli Math &amp; Computer Science

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

Mid Shannon Wilderness Park The potential future of the Longford bogs Mid Shannon Potential 22

MK Optimal Transport and entropic relaxations Soumik Pal University of Washington, Seattle

Welding Torch Problem Nyquist-Shannon Sampling Theorem Welding Torch Problem Model Solution

Welding Torch Problem Nyquist-Shannon Sampling Theorem Welding Torch Problem Model Solution

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

A Monge-Kantorovich approach to multivariate quantile regression Guillaume Carlier a . Joint work

A Kantorovich Monad for Ordered Spaces Paolo Perrone Tobias Fritz Max Planck Institute for

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

An Optimal Transport View on Generalization Nemo Fournier January 13, 2020 An Optimal Transport

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Joint Local Transport Plan for West of England Bristol Transport Strategy The emerging transport

GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and

The Visitor Design Pattern EECS3311 A: Software Design Fall 2018 C HEN -W EI W ANG Motivating

The Interconnect Verification Challenge Franois Cerisier and Mike Bartley Test and

Open Compute Project: An Overview Who Are Involved? What Are The Goals? What Already Happened?

CasADi Joel Andersson Moritz Diehl Department of Electrical Engineering (ESAT-SCD) &amp;

Datacenter Computing @Microsoft David Levinthal Types of f data center computing are diverse

PACE OF DEVELOPMENT Council Workshop Council Workshop Council Workshop Council Workshop

Denman Farm Plan Moonset 11:51 am Moonrise 11:45 pm Implementation Project August 2, 2018

Rigidity Rigidity Symptoms of Poor Design (revisited) 1. Rigidity 1. Rigidity The design

PDEs from Monge-Kantorovich Mass Transportation Theory Luca Petrelli Math & Computer Science

CasADi Joel Andersson Moritz Diehl Department of Electrical Engineering (ESAT-SCD) &