Stability of Talagrand’s Gaussian Transport-Entropy Inequality Dan Mikulincer Geometric and Functional Inequalities in Convexity and Probability Weizmann Institute of Science Based on joint work with Ronen Eldan and Alex Zhai
Geometry and Information Throughout, G ∼ γ will denote the standard Gaussian in R d . Definition (Wasserstein distance between µ and γ ) || x − y || 2 � � 1 / 2 � � W 2 ( µ, γ ) := inf E π π where π ranges over all possible couplings of µ and γ . Definition (Relative entropy between µ and γ ) � � d µ �� Ent ( µ || γ ) := E µ ln d γ ( x ) . Remark: if X ∼ µ we will also write Ent ( X || G ) , W 2 ( X , G ) .
Geometry and Information Throughout, G ∼ γ will denote the standard Gaussian in R d . Definition (Wasserstein distance between µ and γ ) || x − y || 2 � � 1 / 2 � � W 2 ( µ, γ ) := inf E π π where π ranges over all possible couplings of µ and γ . Definition (Relative entropy between µ and γ ) � � d µ �� Ent ( µ || γ ) := E µ ln d γ ( x ) . Remark: if X ∼ µ we will also write Ent ( X || G ) , W 2 ( X , G ) .
Geometry and Information Throughout, G ∼ γ will denote the standard Gaussian in R d . Definition (Wasserstein distance between µ and γ ) || x − y || 2 � � 1 / 2 � � W 2 ( µ, γ ) := inf E π π where π ranges over all possible couplings of µ and γ . Definition (Relative entropy between µ and γ ) � � d µ �� Ent ( µ || γ ) := E µ ln d γ ( x ) . Remark: if X ∼ µ we will also write Ent ( X || G ) , W 2 ( X , G ) .
Geometry and Information Throughout, G ∼ γ will denote the standard Gaussian in R d . Definition (Wasserstein distance between µ and γ ) || x − y || 2 � � 1 / 2 � � W 2 ( µ, γ ) := inf E π π where π ranges over all possible couplings of µ and γ . Definition (Relative entropy between µ and γ ) � � d µ �� Ent ( µ || γ ) := E µ ln d γ ( x ) . Remark: if X ∼ µ we will also write Ent ( X || G ) , W 2 ( X , G ) .
Talagrand’s Inequality In 96 ′ Talagrand proved the following inequality, which connects between geometry and information. Theorem (Talagrand’s Gaussian transport-entropy inequality) Let µ be a measure on R d . Then W 2 2 ( µ, γ ) ≤ 2 Ent ( µ || γ ) . It is enough to consider measures such that µ ≪ ν .
Talagrand’s Inequality - Applications • By considering measures of the form ✶ A d γ the inequality implies a (non-sharp) Gaussian isoperimetric inequality. • The inequality tensorizes and may be used to show dimension-free Gaussian concentration bounds. • If f is convex, then applying the inequality to e − λ f d γ yields a one sides Gaussian concentration for concave functions.
Talagrand’s Inequality - Applications • By considering measures of the form ✶ A d γ the inequality implies a (non-sharp) Gaussian isoperimetric inequality. • The inequality tensorizes and may be used to show dimension-free Gaussian concentration bounds. • If f is convex, then applying the inequality to e − λ f d γ yields a one sides Gaussian concentration for concave functions.
Talagrand’s Inequality - Applications • By considering measures of the form ✶ A d γ the inequality implies a (non-sharp) Gaussian isoperimetric inequality. • The inequality tensorizes and may be used to show dimension-free Gaussian concentration bounds. • If f is convex, then applying the inequality to e − λ f d γ yields a one sides Gaussian concentration for concave functions.
Talagrand’s Inequality - Applications • By considering measures of the form ✶ A d γ the inequality implies a (non-sharp) Gaussian isoperimetric inequality. • The inequality tensorizes and may be used to show dimension-free Gaussian concentration bounds. • If f is convex, then applying the inequality to e − λ f d γ yields a one sides Gaussian concentration for concave functions.
Gaussians If γ a , Σ = N ( a , Σ), in R d : � � Tr (Σ) + || a || 2 • Ent ( γ a , Σ || γ ) = 1 2 − ln(det(Σ)) − d 2 √ 2 � � � � 2 ( γ a , Σ , γ ) = || a || 2 • W 2 2 + Σ − I d � � � � � � � � HS In particular, for any a ∈ R d , W 2 2 ( γ a , I d , γ ) = 2 Ent ( γ a , I d || γ ) . These are the only equality cases.
Gaussians If γ a , Σ = N ( a , Σ), in R d : � � Tr (Σ) + || a || 2 • Ent ( γ a , Σ || γ ) = 1 2 − ln(det(Σ)) − d 2 √ 2 � � � � 2 ( γ a , Σ , γ ) = || a || 2 • W 2 2 + Σ − I d � � � � � � � � HS In particular, for any a ∈ R d , W 2 2 ( γ a , I d , γ ) = 2 Ent ( γ a , I d || γ ) . These are the only equality cases.
Gaussians If γ a , Σ = N ( a , Σ), in R d : � � Tr (Σ) + || a || 2 • Ent ( γ a , Σ || γ ) = 1 2 − ln(det(Σ)) − d 2 √ 2 � � � � 2 ( γ a , Σ , γ ) = || a || 2 • W 2 2 + Σ − I d � � � � � � � � HS In particular, for any a ∈ R d , W 2 2 ( γ a , I d , γ ) = 2 Ent ( γ a , I d || γ ) . These are the only equality cases.
Gaussians If γ a , Σ = N ( a , Σ), in R d : � � Tr (Σ) + || a || 2 • Ent ( γ a , Σ || γ ) = 1 2 − ln(det(Σ)) − d 2 √ 2 � � � � 2 ( γ a , Σ , γ ) = || a || 2 • W 2 2 + Σ − I d � � � � � � � � HS In particular, for any a ∈ R d , W 2 2 ( γ a , I d , γ ) = 2 Ent ( γ a , I d || γ ) . These are the only equality cases.
Stability Define the deficit δ Tal ( µ ) = 2 Ent ( µ || γ ) − W 2 2 ( µ, γ ) . The question of stability deals with approximate equality cases. Question Suppose that δ Tal ( µ ) is small, must µ be close to a translate of the standard Gaussian? Note that the deficit is invariant to translations. So, it will be enough to consider centered measures.
Stability Define the deficit δ Tal ( µ ) = 2 Ent ( µ || γ ) − W 2 2 ( µ, γ ) . The question of stability deals with approximate equality cases. Question Suppose that δ Tal ( µ ) is small, must µ be close to a translate of the standard Gaussian? Note that the deficit is invariant to translations. So, it will be enough to consider centered measures.
Stability Define the deficit δ Tal ( µ ) = 2 Ent ( µ || γ ) − W 2 2 ( µ, γ ) . The question of stability deals with approximate equality cases. Question Suppose that δ Tal ( µ ) is small, must µ be close to a translate of the standard Gaussian? Note that the deficit is invariant to translations. So, it will be enough to consider centered measures.
Instability Theorem (Fathi, Indrei, Ledoux 14’) Let µ be a centered measure on R d . Then � W 1 , 1 ( µ, γ ) 2 , W 1 , 1 ( µ, γ ) � δ Tal ( µ ) � min √ d d The 1-dimensional case was proven earlier by Barthe and Kolesnikov. However: Theorem There exists a sequence of centered Gaussian mixtures { µ n } on R , such that δ Tal ( µ n ) → 0 . but W 2 2 ( µ n , γ ) > 1 .
Instability Theorem (Fathi, Indrei, Ledoux 14’) Let µ be a centered measure on R d . Then � W 1 , 1 ( µ, γ ) 2 , W 1 , 1 ( µ, γ ) � δ Tal ( µ ) � min √ d d The 1-dimensional case was proven earlier by Barthe and Kolesnikov. However: Theorem There exists a sequence of centered Gaussian mixtures { µ n } on R , such that δ Tal ( µ n ) → 0 . but W 2 2 ( µ n , γ ) > 1 .
Bounding the Deficit In the 1-dimensional case, Talagrand actually showed � ϕ ′ µ − 1 − ln( ϕ ′ � � δ Tal ( µ ) = µ ) d γ > 0 , R where ϕ is the transport map ϕ µ = F − 1 ◦ F µ . γ For translated Gaussians, ϕ γ a , 1 ( x ) = x + a , which shows the equality cases. We will take a different route.
Bounding the Deficit In the 1-dimensional case, Talagrand actually showed � ϕ ′ µ − 1 − ln( ϕ ′ � � δ Tal ( µ ) = µ ) d γ > 0 , R where ϕ is the transport map ϕ µ = F − 1 ◦ F µ . γ For translated Gaussians, ϕ γ a , 1 ( x ) = x + a , which shows the equality cases. We will take a different route.
Bounding the Deficit In the 1-dimensional case, Talagrand actually showed � ϕ ′ µ − 1 − ln( ϕ ′ � � δ Tal ( µ ) = µ ) d γ > 0 , R where ϕ is the transport map ϕ µ = F − 1 ◦ F µ . γ For translated Gaussians, ϕ γ a , 1 ( x ) = x + a , which shows the equality cases. We will take a different route.
Bounding the Deficit - the F¨ ollmer Drift Our central construct will be the F¨ ollmer drift, which is the solution to the following variational problem: 1 1 � || u t || 2 � � v t := arg min E dt , 2 u t 0 1 � where u t ranges over all adapted drifts for which B 1 + u t dt has 0 the same law as µ . We denote t � X t := B t + v s ds . 0
Bounding the Deficit - the F¨ ollmer Drift Our central construct will be the F¨ ollmer drift, which is the solution to the following variational problem: 1 1 � || u t || 2 � � v t := arg min E dt , 2 u t 0 1 � where u t ranges over all adapted drifts for which B 1 + u t dt has 0 the same law as µ . We denote t � X t := B t + v s ds . 0
Recommend
More recommend