Sampling multimodal densities in high dimensional sampling space Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS & Telecom ParisTech Paris, France Journ´ ees MAS Toulouse, Aoˆ ut 2014
Sampling multimodal densities in high dimensional sampling space Introduction Sample from a target distribution π dλ on X ⊆ R ℓ , when π is (possibly) known up to a normalizing constant, ֒ → Hereafter, to make the notations simpler, π is assumed to be normalized and in the context π is multimodal large dimension Research guided by Computational Bayesian Statistics π : the a posteriori distribution, known up to a normalizing constant Needed: algorithms to explore π , to compute expectations w.r.t. π , · · · .
Sampling multimodal densities in high dimensional sampling space Introduction Talk based on joint works with Eric Moulines, Amandine Schreck (Telecom ParisTech) Pierre Priouret (Paris VI) Benjamin Jourdain, Tony Leli` evre, Gabriel Stoltz (ENPC) Estelle Kuhn (INRA)
Sampling multimodal densities in high dimensional sampling space Introduction Outline Introduction Usual Monte Carlo samplers The proposal mecanism Adaptive Monte Carlo samplers Conclusion Tempering-based Monte Carlo samplers Biasing Potential-based Monte Carlo sampler Convergence Analysis
Sampling multimodal densities in high dimensional sampling space Introduction Usual Monte Carlo samplers Usual Monte Carlo samplers Markov chain Monte Carlo (MCMC) 1 Sample a Markov chain ( X k ) k having π as unique invariant distribution Approximation: n π ≈ 1 � δ X k n k =1 Example: Hastings-Metropolis algorithm with proposal kernel q ( x,y ) given X k , sample Y ∼ q ( X k , · ) accept-reject mecanism � π ( Y ) q ( Y,X k ) with probability 1 ∧ Y X k +1 = π ( X k ) q ( X k ,Y ) X k otherwise
Sampling multimodal densities in high dimensional sampling space Introduction Usual Monte Carlo samplers Usual Monte Carlo samplers Markov chain Monte Carlo (MCMC) 1 Sample a Markov chain ( X k ) k having π as unique invariant distribution Approximation: n π ≈ 1 � δ X k n k =1 Example: Hastings-Metropolis algorithm with proposal kernel q ( x,y ) given X k , sample Y ∼ q ( X k , · ) accept-reject mecanism � π ( Y ) q ( Y,X k ) with probability 1 ∧ Y X k +1 = π ( X k ) q ( X k ,Y ) X k otherwise Importance Sampling (IS) 2 Sample i.i.d. points ( X k ) k with density q - proposal distribution chosen by the user Approximation: n π ≈ 1 π ( X k ) � q ( X k ) δ X k n k =1
Sampling multimodal densities in high dimensional sampling space Introduction The proposal mecanism The proposal mecanism: MCMC Toy example in the case: Hastings-Metropolis algorithm with Gaussian proposal kernel � � − 1 2( y − x ) T Σ − 1 ( y − x ) q ( x,y ) ∝ exp π ( Y ) Acceptance-rejection ratio: 1 ∧ π ( X k ) 3 2.5 3 2 2 2 1.5 1 1 1 0 0 0.5 −1 −1 0 −2 −2 −0.5 −3 −1 −3 0 500 1000 0 500 1000 0 500 1000 1 1 1.2 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 −0.2 0 50 100 0 50 100 0 50 100 Fig. : For three different values of Σ : [top] Plot of the chain (in R );[bottom] autocorrelation function
Sampling multimodal densities in high dimensional sampling space Introduction The proposal mecanism The proposal mecanism: Importance Sampling (1/2) Toy example: � 1 compute | x | π ( x ) dx when π ( x ) ∼ t (3) ∝ (1 + x 2 3 ) 2 R Consider in turn the proposal q equal to a Student t (1) and then to a Normal N (0 , 1) 0.4 1.8 1.8 0.35 1.6 1.6 1.4 1.4 0.3 1.2 1.2 0.25 1 1 0.2 0.8 0.8 0.15 0.6 0.6 0.1 0.4 0.4 0.2 0.2 0.05 100 500 1000 1500 100 500 1000 1500 Nbr of samples Nbr of samples when q ∼ t(1) when q ∼ N(0,1) 0 −8 −6 −4 −2 0 2 4 6 8 Boxplot computed from 100 runs of the algorithm Plot of the densities q (green, blue) and π (in red)
Sampling multimodal densities in high dimensional sampling space Introduction The proposal mecanism The proposal mecanism: Importance Sampling (2/2) The efficiency of the algorithm depends upon the proposal distribution q : if few large weights and the others negligible, the approximation is likely not accurate Monitoring the convergence: there exist criteria measuring the proportion of “ineffective draws”: Coefficient of Variation Effective Sample Size Normalized perplexity
Sampling multimodal densities in high dimensional sampling space Introduction Adaptive Monte Carlo samplers Adaptive Monte Carlo samplers To fix some design parameters and make the samplers more efficient: adaptive Monte Carlo samplers were proposed Adaptive Algorithms: - The optimal design parameters are defined as the solutions of an optimality criterion. In practice, it can not be solved explicitly. - Based on the past history of the sampler , solve an approximation of this criterion and compute the design parameters for the current run of the samplers. - Repeat the scheme: adaption/sampling.
Sampling multimodal densities in high dimensional sampling space Introduction Adaptive Monte Carlo samplers Adaptive MC sampler: example of adaptive MCMC (1/2) Adaptive Hastings-Metropolis algorithm with Gaussian proposal distribution � � − 1 2( y − x ) T Σ − 1 ( y − x ) q Σ ( x,y ) ∝ exp Design parameters: the covariance matrix Σ Optimal criterion: by using the scaling approach for Markov Chains, it is advocated pioneering work: Roberts, Gelman, Gilks (1997) Σ = (2 . 38) 2 × covariance of π ℓ Iterative algorithm Haario, Saksman, Tamminen (2001) Adaption Update the covariance matrix Σ t = (2 . 38) 2 Σ ( π ) × � t ℓ Sampling one step of a Hastings-Metropolis algorithm with proposal q Σ t to sample X t +1 .
Sampling multimodal densities in high dimensional sampling space Introduction Adaptive Monte Carlo samplers Adaptive MC sampler: example of adaptive MCMC (2/2) Nevertheless, this receipe is not designed for any context. Example: multimodality Target distribution: mixture of 20 Gaussian in R 2 . The means of Adaptive Hastings Metropolis: 5 10 6 draws the Gaussians are indicated with a red cross. 5 10 6 i.i.d. draws
Sampling multimodal densities in high dimensional sampling space Introduction Adaptive Monte Carlo samplers Adaptive MC sampler: example of Adaptive Importance Sampling (1/2) Design parameter: the proposal distribution Optimal criterion: choose the proposal density q among a (parametric) family Q as the solution of � � π ( x ) � � argmin q ∈Q log π ( x ) λ ( dx ) ⇐ ⇒ argmax q ∈Q log q ( x ) π ( x ) λ ( dx ) q ( x ) Iterative algorithm: O. Capp´ e, A. Guillin, J.M. Marin, C.Robert (2004) Adaption Update the sampling distribution n π ( X ( t − 1) � ) 1 log q ( X ( t − 1) k q t = argmax q ∈Q ) k q t − 1 ( X ( t − 1) n ) k =1 k Sampling Draw points ( X ( t ) k ) k + importance reweighting n π ( X ( t ) � π ≈ 1 k ) δ X ( t ) n q t ( X ( t ) k ) k k =1
Sampling multimodal densities in high dimensional sampling space Introduction Adaptive Monte Carlo samplers Adaptive MC sampler: example of Adaptive Importance Sampling (2/2) Nevertheless, it is known that such Importance Sampling techniques are not robust to the dimension: when sampling on R ℓ with ℓ > 15 , the degeneracy of the importance ratios π ( X k ) q ( X k ) can not be avoided.
Sampling multimodal densities in high dimensional sampling space Introduction Conclusion Conclusion Usual adaptive Monte Carlo samplers are not robust (enough) to the context of • multimodality of the target distribution π : how to jump from modes to modes. • large dimension of the sampling space π ( x ) Importance Sampling: q ( x ) 1 ∧ π ( y ) q ( y,x ) π ( x ) q ( x,y ) = 1 ∧ π ( y ) MCMC: π ( x ) when q is a symetric kernel New Monte Carlo samplers combine tempering techniques and/or biasing potential techniques and sampling steps.
Sampling multimodal densities in high dimensional sampling space Tempering-based Monte Carlo samplers Outline Introduction Tempering-based Monte Carlo samplers The Equi-Energy sampler Biasing Potential-based Monte Carlo sampler Convergence Analysis
Sampling multimodal densities in high dimensional sampling space Tempering-based Monte Carlo samplers Tempering: the idea 0.5 densité cible 0.4 0.3 0.2 densité tempérée 0.1 0 -6 -4 -2 0 2 4 6 Learn a well fitted proposal mecanism by considering tempered versions π 1 /T of the target distribution π . ( T > 1 ) Hereafter, an example where tempering is plugged in a MCMC sampler.
Sampling multimodal densities in high dimensional sampling space Tempering-based Monte Carlo samplers The Equi-Energy sampler Example: Equi-Energy sampler (1/6) Kou, Zhou and Wong (2006) In the MCMC proposal mecanism, allow to pick a point from an auxiliary process designed to have better mixing properties. Auxiliary process With target π β Y 2 Y t-1 Y t Y 1 θ 1 θ 2 θ t-1 θ t The process of interest With target π X 1 X 2 X t-1 X t
Recommend
More recommend