Geometric ergodicity in Wasserstein distance of a Metropolis - PowerPoint PPT Presentation

Geometric ergodicity in Wasserstein distance of a Metropolis algorithm based on a first-order Euler exponential integrator Alain Durmus Joint work with Éric Moulines Département TSI, Telecom ParisTech Séminaire d’analyse numérique, Université de Genève, 28 Octobre 2014

Outlines Introduction to Markov Chain Monte Carlo methods 1 Motivations Some Markov chain theory The Metropolis-Hastings algorithm Uniform ergodicity of the independent sampler Symmetric Random Walk Metropolis Geometric ergodicity in Wasserstein distance and application 2 Geometric ergodicity in Wasserstein distance Application to the EI-MALA page 2 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods Outlines Introduction to Markov Chain Monte Carlo methods 1 Motivations Some Markov chain theory The Metropolis-Hastings algorithm Uniform ergodicity of the independent sampler Symmetric Random Walk Metropolis Geometric ergodicity in Wasserstein distance and application 2 Geometric ergodicity in Wasserstein distance Application to the EI-MALA page 3 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Motivations Outlines Introduction to Markov Chain Monte Carlo methods 1 Motivations Some Markov chain theory The Metropolis-Hastings algorithm Uniform ergodicity of the independent sampler Symmetric Random Walk Metropolis Geometric ergodicity in Wasserstein distance and application 2 Geometric ergodicity in Wasserstein distance Application to the EI-MALA page 4 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Motivations Bayesian setting (I) - Let ( E , d ) be a Polish space endowed with its σ -field B ( E ). - In a Bayesian setting, a parameter x ∈ E is embedded with a prior distribution π and the observations are given by a probabilistic model : Y ∼ ℓ ( ·| x ) The inference is then based on the posterior distribution : π ( d x ) ℓ ( Y | x ) π ( d x | Y ) = ℓ ( Y | u ) π ( d u ) . � In most cases the normalizing constant is not tractable : π ( d x | Y ) ∝ π ( d x ) ℓ ( Y | x ) . page 5 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Motivations Bayesian setting (II) Bayesian decision theory relies on minimization problems involving expectations : � L ( x , θ ) ℓ ( Y | x ) π ( d x ) E Generic problem : estimation of an expectation E π [ f ], where - π is known up to a multiplicative factor ; - we do not know how to sample from π (no basic Monte Carlo estimator) ; - π is high dimensional density (usual importance sampling and accept/reject inefficient). page 6 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Motivations Key tool : the rejection sampling In the case E = R d , and π has a density with respect to the Lebesgue measure Leb d , also denoted π . Assume we know that π ( x ) ≤ M ν ( x ) and that we know how to sample from ν . 1. Sample X ∼ ν and U ∼ U ([0 , 1]). π ( X ) 2. If U ≤ M ν ( X ) , accept X . 3. Else go to 1. F IGURE : * Illustration of the Accept-Reject method [Cappé, Moulines, Ryden 2005]. page 7 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Motivations Inefficiency of the rejection sampling - Hard to find a probability ν such that π ≤ M ν (especially for high dimensional settings). - On one hand M − 1 is the rate of acceptance so that M has to be as close to 1 as possible. But on the other hand, in practice M is exponentially large in the dimension. Alternative : MCMC method ! page 8 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Some Markov chain theory Outlines Introduction to Markov Chain Monte Carlo methods 1 Motivations Some Markov chain theory The Metropolis-Hastings algorithm Uniform ergodicity of the independent sampler Symmetric Random Walk Metropolis Geometric ergodicity in Wasserstein distance and application 2 Geometric ergodicity in Wasserstein distance Application to the EI-MALA page 9 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Some Markov chain theory Some Markov chain theory (I) Definition Let P : E × B ( E ) → R + . P is a Markov kernel if - for all x ∈ E , A �→ P ( x , A ) is probability measure on E , - for all A ∈ B ( E ), x �→ P ( x , A ) is measurable from E to R . page 10 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Some Markov chain theory Some Markov chain theory (II) Some simple properties : - If P 1 and P 2 is two Markov kernel, we can define a new Markov kernel, denoted P 1 P 2 , by for all x ∈ E and A ∈ B ( E ) : � P 1 P 2 ( x , A ) = P 1 ( x , d z ) P 2 ( z , A ) . E - If P is a Markov kernel and ν a probability measure on E , we can define a probability measure, denoted ν P , by for all A ∈ B ( E ) : � ν P ( A ) = ν ( d z ) P ( z , A ) . E - Let P be a Markov kernel on E . For f : E → R + measurable, we can define a measurable function Pf : E → ¯ R + by � Pf ( x ) = P ( x , d z ) f ( z ) . E page 11 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Some Markov chain theory Some Markov chain theory (III) Invariant probability measure : π is said to be an invariant probability measure for the Markov kernel P if π P = P . Theorem (Meyn and Tweedie, 2003, Ergodic theorem) With some conditions on P, we have for any f ∈ L 1 ( π ) , n π ( f ) = 1 � � ˆ f ( X i ) − f ( x ) π ( d x ) . → n π -a.s. i =1 page 12 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Some Markov chain theory Conditions of the Theorem Definition - Irreducibility : there exists a measure ν such that, for all x and all A such that ν ( A ) > 0, there exists n ∈ N ∗ s.t. P n ( x , A ) > 0. - Harris recurrence : P is Harris recurrent : for all A ∈ B ( E ) satisfying π ( A ) > 0, for all x in A � + ∞ � � ✶ A ( X k ) = + ∞ | X 0 = x = 1 . P k =1 page 13 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Some Markov chain theory MCMC : rationale (I) The Theorem above gives the following idea to approximate E π [ f ] : - Find a kernel P with invariant measure π , from which we can efficiently sample. - Sample a Markov chain X 1 , . . . , X n with kernel P and compute n π ( f ) = 1 � ˆ f ( X i ) n i =1 to approximate E π [ f ] = � f ( x ) π ( d x ). ⇒ How to find a Markov kernel P with invariant measure π ? page 14 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ Some Markov chain theory MCMC : rationale (II) Simple condition to check that π is invariant for P : reversibility. Definition P is reversible with respect to π if for all A 1 , A 2 ∈ B ( E ) : � � � � π ( d z 1 ) P ( z 1 , d z 2 ) = π ( d z 2 ) P ( z 2 , d z 1 ) . A 1 A 2 A 1 A 2 - Note the variables z 1 and z 2 are switched. - For A 1 = E and A 2 = A , we get π ( A ) = π P ( A ). page 15 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ The Metropolis-Hastings algorithm Outlines Introduction to Markov Chain Monte Carlo methods 1 Motivations Some Markov chain theory The Metropolis-Hastings algorithm Uniform ergodicity of the independent sampler Symmetric Random Walk Metropolis Geometric ergodicity in Wasserstein distance and application 2 Geometric ergodicity in Wasserstein distance Application to the EI-MALA page 16 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ The Metropolis-Hastings algorithm The Metropolis-Hastings algorithm (I) The Metropolis-Hastings algorithm gives a generic method to build Markov kernels P reversible w.r.t. π in the case where : - E = R d . - Objective target probability π has a density w.r.t. Leb d , also denoted π . Using of a transition density q ( x , y ) w.r.t. Leb d : - ( x , y ) �→ q ( x , y ) is measurable, - For all x , y �→ q ( x , y ) is a density of a probability measure also denoted q ( x , · ). page 17 A. Durmus Geometric ergodicity in Wasserstein distance

Introduction to Markov Chain Monte Carlo methods ◮ The Metropolis-Hastings algorithm The Metropolis-Hastings algorithm (II) Given X k , 1. Generate Y k +1 ∼ q ( · , X k ). 2. Set � with probability α ( X k , Y k +1 ) , Y k +1 X k +1 = with probability 1 − α ( X k , Y k +1 ) . X k where α ( x , y ) = 1 ∧ π ( y ) q ( y , x ) q ( x , y ) . π ( x ) - With this choice of α the algorithm produces a Markov kernel P MH reversible w.r.t. π . - “No restriction” on π and q . page 18 A. Durmus Geometric ergodicity in Wasserstein distance

Geometric ergodicity in Wasserstein distance of a Metropolis - PowerPoint PPT Presentation

Geometric ergodicity in Wasserstein distance of a Metropolis algorithm based on a first-order Euler exponential integrator Alain Durmus Joint work with ric Moulines Dpartement TSI, Telecom ParisTech Sminaire danalyse numrique,

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Analyzing fluid flows via the ergodicity defect ergodicity defect Sherry E. Scott FFT 2013

Robust and structural ergodicity of stochastic reaction networks Corentin Briat and Mustafa

Stochastic Optimization for Regularized Wasserstein Estimators ICML 2020 Francis Bach Quentin

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

A variational finite volume scheme for Wasserstein gradient flows es 1 , T. O. Gallou et 2 , G.

On the Complexity of Approximating Wasserstein Barycenters Alexey Kroshnin, Darina Dvinskikh,

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters Pavel Dvurechensky,

Wasserstein barycenters over Riemannian manifolds Brendan Pass (joint work with Y.H. Kim (UBC))

Weak Ergodicity Breaking on the Nano-Scale Eli Barkai Bar-Ilan University Bel, Burov, Margolin,

Ergodicity-nonergodicity transitions in driven many-body systems Toma Prosen Department of

Chaos and ergodicity in the one and two handrail models dimensional dripping handrail models

Quantum ergodicity Nalini Anantharaman Universit e de Strasbourg 24 mars 2016 QE on

Directional recurrence, ergodicity, and weak mixing Ay se S ahin DePaul University June

What is an Optimal Bayesian Method? Chris. J. Oates Newcastle University & Lloyds

VARMA versus VAR for Macroeconomic Forecasting George Athanasopoulos Department of Econometrics

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Generative Adversarial Networks Phillip Isola 9.520 10/17/18 Image classification Classifier

Whats inside English? Prof. Diane Pecorari Head Department of English Crafting Creative and

d Applications of Partial Derivative i E 0.25 Lecture a l l u d b Dr. Abdulla Eid A .

Commerce Commission Review of the state of competition in New Zealands dairy markets 6

The Perl 6 Language Jonathan Worthington UKUUG Spring 2007 Conference The Perl 6 Language

Geometric ergodicity in Wasserstein distance of a Metropolis - PowerPoint PPT Presentation

Geometric ergodicity in Wasserstein distance of a Metropolis algorithm based on a first-order Euler exponential integrator Alain Durmus Joint work with ric Moulines Dpartement TSI, Telecom ParisTech Sminaire danalyse numrique,

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

Distance Education Distance education used to be about the distance. 1700s 1800s 1900s 2000s

Analyzing fluid flows via the ergodicity defect ergodicity defect Sherry E. Scott FFT 2013

Robust and structural ergodicity of stochastic reaction networks Corentin Briat and Mustafa

Stochastic Optimization for Regularized Wasserstein Estimators ICML 2020 Francis Bach Quentin

Mark-recapture distance sampling (MRDS) in Distance 7.1 Setting up Distance for MRDS

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

A variational finite volume scheme for Wasserstein gradient flows es 1 , T. O. Gallou et 2 , G.

On the Complexity of Approximating Wasserstein Barycenters Alexey Kroshnin, Darina Dvinskikh,

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters Pavel Dvurechensky,

Wasserstein barycenters over Riemannian manifolds Brendan Pass (joint work with Y.H. Kim (UBC))

Weak Ergodicity Breaking on the Nano-Scale Eli Barkai Bar-Ilan University Bel, Burov, Margolin,

Ergodicity-nonergodicity transitions in driven many-body systems Toma Prosen Department of

Chaos and ergodicity in the one and two handrail models dimensional dripping handrail models

Quantum ergodicity Nalini Anantharaman Universit e de Strasbourg 24 mars 2016 QE on

Directional recurrence, ergodicity, and weak mixing Ay se S ahin DePaul University June

What is an Optimal Bayesian Method? Chris. J. Oates Newcastle University &amp; Lloyds

VARMA versus VAR for Macroeconomic Forecasting George Athanasopoulos Department of Econometrics

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Generative Adversarial Networks Phillip Isola 9.520 10/17/18 Image classification Classifier

Whats inside English? Prof. Diane Pecorari Head Department of English Crafting Creative and

d Applications of Partial Derivative i E 0.25 Lecture a l l u d b Dr. Abdulla Eid A .

Commerce Commission Review of the state of competition in New Zealands dairy markets 6

The Perl 6 Language Jonathan Worthington UKUUG Spring 2007 Conference The Perl 6 Language

What is an Optimal Bayesian Method? Chris. J. Oates Newcastle University & Lloyds