See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/278410788 Presentation "Sticky Proposals" Presentation · January 2014 CITATIONS READS 0 31 1 author: Luca Martino King Juan Carlos University 147 PUBLICATIONS 1,513 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Atmospheric Look-up table Generator (ALG) View project Scalable strategies for efficient Gaussian Process Regression View project All content following this page was uploaded by Luca Martino on 16 June 2015. The user has requested enhancement of the downloaded file.
Sticky proposal densities for adaptive MCMC methods L. Martino † ,R. Casarin ‡ , F. Leisen § , D. Luengo ¶ , † University of Helsinki, ‡ Universit´ a Ca’ Foscari, § University of Kent, ¶ Universidad Politecnica de Madrid. MCQMC, 2014 1 / 24 2014
Introduction ◮ Markov Chain Monte Carlo (MCMC) methods convert samples from a proposal pdf ˜ q ( x ) ∝ q ( x ), into correlated samples from a target pdf ˜ π ( x ) ∝ π ( x ), generating a chain. x 0 = ⇒ x 1 = ⇒ . . . x t = x t +1 = ⇒ . . . x t + τ ∼ ˜ π ( x ) ⇒ ���� K ( x t | x t − 1 ) 2 / 24
Introduction ◮ Markov Chain Monte Carlo (MCMC) methods convert samples from a proposal pdf ˜ q ( x ) ∝ q ( x ), into correlated samples from a target pdf ˜ π ( x ) ∝ π ( x ), generating a chain. x 0 = ⇒ x 1 = ⇒ . . . x t = x t +1 = ⇒ . . . x t + τ ∼ ˜ π ( x ) ⇒ ���� K ( x t | x t − 1 ) ◮ Within the Monte Carlo (MC) techniques: ◮ [Gilks et al. (1992)] : adaptive rejection sampling ( ARS ), ◮ [Gilks et al. (1995)] : adaptive rejection Metropolis sampling ( ARMS ), are samplers from univariate pdfs. 2 / 24
Introduction ◮ Markov Chain Monte Carlo (MCMC) methods convert samples from a proposal pdf ˜ q ( x ) ∝ q ( x ), into correlated samples from a target pdf ˜ π ( x ) ∝ π ( x ), generating a chain. x 0 = ⇒ x 1 = ⇒ . . . x t = x t +1 = ⇒ . . . x t + τ ∼ ˜ π ( x ) ⇒ ���� K ( x t | x t − 1 ) ◮ Within the Monte Carlo (MC) techniques: ◮ [Gilks et al. (1992)] : adaptive rejection sampling ( ARS ), ◮ [Gilks et al. (1995)] : adaptive rejection Metropolis sampling ( ARMS ), are samplers from univariate pdfs. ◮ They are often used within Gibbs sampling. ◮ Both techniques present different limitations. 2 / 24
Introduction ◮ Markov Chain Monte Carlo (MCMC) methods convert samples from a proposal pdf ˜ q ( x ) ∝ q ( x ), into correlated samples from a target pdf ˜ π ( x ) ∝ π ( x ), generating a chain. x 0 = ⇒ x 1 = ⇒ . . . x t = x t +1 = ⇒ . . . x t + τ ∼ ˜ π ( x ) ⇒ ���� K ( x t | x t − 1 ) ◮ Within the Monte Carlo (MC) techniques: ◮ [Gilks et al. (1992)] : adaptive rejection sampling ( ARS ), ◮ [Gilks et al. (1995)] : adaptive rejection Metropolis sampling ( ARMS ), are samplers from univariate pdfs. ◮ They are often used within Gibbs sampling. ◮ Both techniques present different limitations. ◮ GOAL: Overcoming these drawbacks by proposing a more general and efficient class of adaptive samplers. 2 / 24
Performance ◮ The performance of an MCMC method depends strictly on the discrepancy between proposal, q and target, π . 3 / 24
Performance ◮ The performance of an MCMC method depends strictly on the discrepancy between proposal, q and target, π . ◮ If proposal=target, we have an exact sampler. ...in a independent MH, for instance... α ≈ 1 π ( x ) q ( x ) π ( x ) π ( x ) q ( x ) q ( x ) “better” “better” x x x 3 / 24
Performance ◮ The performance of an MCMC method depends strictly on the discrepancy between proposal, q and target, π . ◮ If proposal=target, we have an exact sampler. ...in a independent MH, for instance... α ≈ 1 π ( x ) q ( x ) π ( x ) π ( x ) q ( x ) q ( x ) “better” “better” x x x ◮ Need of adapting the proposal density, while ensuring ergodicity. 3 / 24
Adaptive procedures ◮ Parametric: Learn parameters of the proposal (location and/or scale parameter). ◮ Non-parametric: Approximate the target via non-parametric procedures (as in kernel density estimation ). 4 / 24
Adaptive procedures ◮ Parametric: Learn parameters of the proposal (location and/or scale parameter). ◮ Non-parametric: Approximate the target via non-parametric procedures (as in kernel density estimation ). ◮ Simple idea: Update the proposal taking into account the histogram of the generated samples (after “burn-in”): x 1 , . . . , x t , . . . , x t + τ . . . (1 − β t ) β t proposal × random walk × x 4 / 24
other useful information ◮ We have several evaluations of the target pdf available (at least at each state of the chain). x 1 , . . . , x t , . . . , x t + τ , π ( x 1 ) , . . . , π ( x t ) , . . . , π ( x t + τ ) . ◮ Can we incorporate all this information (or a subset) in the learning procedure? 5 / 24
other useful information ◮ We have several evaluations of the target pdf available (at least at each state of the chain). x 1 , . . . , x t , . . . , x t + τ , π ( x 1 ) , . . . , π ( x t ) , . . . , π ( x t + τ ) . ◮ Can we incorporate all this information (or a subset) in the learning procedure? ◮ AIM: Interpolative construction of a proposal q which depends on a subset S t ⊂ { x 1 , . . . , x t } , q ( x ) = ˜ ˜ q t ( x ) ∝ q t ( x |S t ) . ◮ Adaptive proposal = ⇒ adaptive MCMC. 5 / 24
Interpolation procedures ◮ Consider a set of support points S t = { s 1 , . . . , s m t } , and V ( x ) = log[ π ( x )] , W t ( x ) = log[ q t ( x |S t )] . ◮ Interpolation procedure: W t ( x ) p ( x ) V ( x ) π ( x ) W t ( x ) V ( x ) q t ( x |S t ) " t ( x ) s 1 s 2 s 3 s 4 s 5 s 1 s 2 s 3 s 4 s 5 s 1 s 2 s 3 s 4 s 5 s 6 (a) P2 : log-domain (b) P3 : log-domain (c) P4 : pdf-domain 6 / 24
Interpolation procedures ◮ Similar to the constructions in the adaptive rejection sampling (ARS) [Gilks et al., 1992] and adaptive rejection Metropolis sampling (ARMS) methods [Gilks et al., 1995] . w 2 ( x ) W t ( x ) V ( x ) W t ( x ) w 1 ( x ) V ( x ) w 3 ( x ) s 1 s 3 s 4 s 5 s 2 s 1 s 2 s 3 s 6 (d) log-domain (ARS) (e) P1 : log-domain (ARMS) ◮ ARS: only for log-concave pdfs. ◮ ARMS: sometimes incomplete adaptation. 7 / 24
Interpolation procedures (f) P4 : |S t | = 6 (g) P4 : |S t | = 7 (h) P4 : |S t | = 8 (i) P4 : |S t | = 9 (j) P4 : |S t | > 100 ◮ Here the points are not adaptively chosen. 8 / 24
Drawing from q t 1. Calculate analytically the area below each piece, i.e., � s j +1 q t ( x |S t ) dx = A j , j = 0 , . . . , m t , s j denoting s 0 = −∞ and s m t +1 = + ∞ . 2. Choose a j ∗ -th piece according to A j ω j = j = 0 , . . . , m t . , � n j =1 A j 3. Draw a sample x ′ from q t ( x |S t ) with x ∈ ( s j ∗ , s j ∗ +1 ). P2 → exponential pieces P3 → uniform pieces P4 → linear pieces 9 / 24
Computational cost - efficiency ◮ More points: better approximation of the target ⇒ more efficiency (i.e., less correlation ⇔ faster convergence). ◮ More points: to draw from q t is more costly. m t ↑ = ⇒ efficiency ↑ + computational cost ↑ ◮ Desired adaptive strategy: manage the set S t in order to build a “good” proposal with a small number m t of points, keeping the ergodicity of the sampler. 10 / 24
Adaptive Sticky Metropolis (ASM) 1. Construction of the proposal: Build a proposal q t ( x |S t ), using the set S t = { s 1 , . . . , s m t } (e.g., using P1 , P2 , P3 and P4 ). 11 / 24
Adaptive Sticky Metropolis (ASM) 1. Construction of the proposal: Build a proposal q t ( x |S t ), using the set S t = { s 1 , . . . , s m t } (e.g., using P1 , P2 , P3 and P4 ). 2. MH step: 2.1 Draw x ′ from ˜ q t ( x ) ∝ q t ( x |S t ). 2.2 Set x t +1 = x ′ and z = x t with probability α = 1 ∧ π ( x ′ ) q t ( x t |S t ) π ( x t ) q t ( x ′ |S t ) , and set x t +1 = x t and z = x ′ , with probability 1 − α . 11 / 24
Adaptive Sticky Metropolis (ASM) 1. Construction of the proposal: Build a proposal q t ( x |S t ), using the set S t = { s 1 , . . . , s m t } (e.g., using P1 , P2 , P3 and P4 ). 2. MH step: 2.1 Draw x ′ from ˜ q t ( x ) ∝ q t ( x |S t ). 2.2 Set x t +1 = x ′ and z = x t with probability α = 1 ∧ π ( x ′ ) q t ( x t |S t ) π ( x t ) q t ( x ′ |S t ) , and set x t +1 = x t and z = x ′ , with probability 1 − α . 3. Test to update S t : Set S t +1 = S t ∪ { z } with prob. P a = η ( d t ( z )) , otherwise S t +1 = S t . 11 / 24
Adaptive Sticky Metropolis (ASM) 1. Construction of the proposal: Build a proposal q t ( x |S t ), using the set S t = { s 1 , . . . , s m t } (e.g., using P1 , P2 , P3 and P4 ). 2. MH step: 2.1 Draw x ′ from ˜ q t ( x ) ∝ q t ( x |S t ). 2.2 Set x t +1 = x ′ and z = x t with probability α = 1 ∧ π ( x ′ ) q t ( x t |S t ) π ( x t ) q t ( x ′ |S t ) , and set x t +1 = x t and z = x ′ , with probability 1 − α . 3. Test to update S t : Set S t +1 = S t ∪ { z } with prob. P a = η ( d t ( z )) , otherwise S t +1 = S t . ◮ d t ( z ) ⇒ a positive measure of the distance in z between the q t and π . 11 / 24
Recommend
More recommend