scalable metropolis hastings for exact bayesian inference
play

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large - PowerPoint PPT Presentation

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets Rob Cornish Paul Vanetti Alexandre Bouchard-C ot e George Deligiannidis Arnaud Doucet June 8, 2019 Cornish et al. Scalable MetropolisHastings June 8,


  1. Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets Rob Cornish Paul Vanetti Alexandre Bouchard-Cˆ ot´ e George Deligiannidis Arnaud Doucet June 8, 2019 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 1 / 24

  2. Problem Bayesian inference via MCMC is expensive for large datasets Cornish et al. Scalable Metropolis–Hastings June 8, 2019 2 / 24

  3. Problem Consider a posterior over parameters θ given n data points y i : n � π ( θ ) = p ( θ | y 1: n ) ∝ p ( θ ) p ( y i | θ ) . i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 3 / 24

  4. Problem Consider a posterior over parameters θ given n data points y i : n � π ( θ ) = p ( θ | y 1: n ) ∝ p ( θ ) p ( y i | θ ) . i =1 Metropolis–Hastings Given a proposal q and current state θ : 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n � α MH ( θ, θ ′ ) := 1 ∧ q ( θ ′ , θ ) π ( θ ′ ) q ( θ, θ ′ ) π ( θ ) = 1 ∧ q ( θ ′ , θ ) p ( θ ′ ) p ( y i | θ ′ ) q ( θ, θ ′ ) p ( θ ) p ( y i | θ ) i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 3 / 24

  5. Problem Consider a posterior over parameters θ given n data points y i : n � π ( θ ) = p ( θ | y 1: n ) ∝ p ( θ ) p ( y i | θ ) . i =1 Metropolis–Hastings Given a proposal q and current state θ : 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n � α MH ( θ, θ ′ ) := 1 ∧ q ( θ ′ , θ ) π ( θ ′ ) q ( θ, θ ′ ) π ( θ ) = 1 ∧ q ( θ ′ , θ ) p ( θ ′ ) p ( y i | θ ′ ) q ( θ, θ ′ ) p ( θ ) p ( y i | θ ) i =1 ⇒ O ( n ) computation per step to compute α MH ( θ, θ ′ ) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 3 / 24

  6. Our approach Want a method with cost o ( n ) per step – subsampling Cornish et al. Scalable Metropolis–Hastings June 8, 2019 4 / 24

  7. Our approach Want a method with cost o ( n ) per step – subsampling Want our method not to reduce accuracy – exactness Cornish et al. Scalable Metropolis–Hastings June 8, 2019 4 / 24

  8. 10 5 MH SMH-1 Likelihoods per iteration SMH-2 10 4 10 3 10 2 10 1 64 128 256 512 1024 2048 4096 8192 32768 131072 n Our approach Several existing exact subsampling methods: Firefly [Maclaurin and Adams, 2014] Delayed acceptance [Banterle et al., 2015] Piecewise-deterministic MCMC [Bouchard-Cˆ ot´ e et al., 2018, Bierkens et al., 2018] Cornish et al. Scalable Metropolis–Hastings June 8, 2019 5 / 24

  9. Our approach Several existing exact subsampling methods: 10 5 MH SMH-1 Firefly Likelihoods per iteration SMH-2 10 4 [Maclaurin and Adams, 2014] Delayed acceptance 10 3 [Banterle et al., 2015] 10 2 Piecewise-deterministic 10 1 MCMC 64 128 256 512 1024 2048 4096 8192 32768 131072 [Bouchard-Cˆ ot´ e et al., 2018, n Bierkens et al., 2018] Figure 1: Average number of likelihood Our method: an exact evaluations per iteration required by subsampling scheme based on a SMH for a 10-dimensional logistic proxy target that requires on regression posterior as the number of average O (1) or O (1 / √ n ) data points n increases. likelihood evaluations per step Cornish et al. Scalable Metropolis–Hastings June 8, 2019 5 / 24

  10. Three key ingredients 1 A factorised MH acceptance probability 2 Procedures for fast simulation of Bernoulli random variables 3 Control performance using an approximate target (“control variates”) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 6 / 24

  11. Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

  12. Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Obvious choice (with a flat prior) is π i ( θ ′ ) = p ( y i | θ ) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

  13. Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Obvious choice (with a flat prior) is π i ( θ ′ ) = p ( y i | θ ) Can show that (for a symmetric proposal) n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 is also a valid acceptance probability for an MH-style algorithm Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

  14. Ingredient 1 - Factorised Metropolis–Hastings Suppose we can factor the target like n � π ( θ ) ∝ π i ( θ ) i =1 Obvious choice (with a flat prior) is π i ( θ ′ ) = p ( y i | θ ) Can show that (for a symmetric proposal) n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 is also a valid acceptance probability for an MH-style algorithm Compare the MH acceptance probability as � n π i ( θ ′ ) α MH ( θ, θ ′ ) = 1 ∧ π i ( θ ) i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 7 / 24

  15. Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

  16. Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Can implement acceptance step by sampling independent B i ∼ Bernoulli ( α FMH i ( θ, θ ′ )) and accepting if every B i = 1 Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

  17. Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Can implement acceptance step by sampling independent B i ∼ Bernoulli ( α FMH i ( θ, θ ′ )) and accepting if every B i = 1 Can stop as soon as some B i = 0: delayed acceptance Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

  18. Ingredient 1 - Factorised Metropolis–Hastings Explicitly, (assuming symmetric q ) FMH algorithm is: Factorised Metropolis-Hastings (FMH) 1 Propose θ ′ ∼ q ( θ, · ) 2 Accept θ ′ with probability n n � � 1 ∧ π i ( θ ′ ) α FMH ( θ, θ ′ ) := α FMH i ( θ, θ ′ ) := π i ( θ ) i =1 i =1 Can implement acceptance step by sampling independent B i ∼ Bernoulli ( α FMH i ( θ, θ ′ )) and accepting if every B i = 1 Can stop as soon as some B i = 0: delayed acceptance However, still must compute all n terms in order to accept Cornish et al. Scalable Metropolis–Hastings June 8, 2019 8 / 24

  19. Three key ingredients 1 A factorised MH acceptance probability 2 Procedures for fast simulation of Bernoulli random variables 3 Control performance using an approximate target (“control variates”) Cornish et al. Scalable Metropolis–Hastings June 8, 2019 9 / 24

  20. Ingredient 2 - Fast Bernoulli simulation How can we avoid simulating these n Bernoullis? Cornish et al. Scalable Metropolis–Hastings June 8, 2019 10 / 24

  21. Ingredient 2 - Fast Bernoulli simulation How can we avoid simulating these n Bernoullis? Assuming we have bounds λ i ( θ, θ ′ ) ≥ − log α FMH i ( θ, θ ′ ) =: λ i ( θ, θ ′ ) we can use the following: Poisson subsampling 1 C ∼ Poisson ( � n i =1 λ i ( θ, θ ′ )) � [ λ i ( θ, θ ′ ) / � n � iid 2 X 1 , . . . , X C i =1 λ i ( θ, θ ′ )] 1 ≤ i ≤ n ∼ Categorical 3 B j ∼ Bernoulli ( λ X j ( θ, θ ′ ) /λ X j ( θ, θ ′ )) for 1 ≤ j ≤ C Cornish et al. Scalable Metropolis–Hastings June 8, 2019 10 / 24

  22. Ingredient 2 - Fast Bernoulli simulation How can we avoid simulating these n Bernoullis? Assuming we have bounds λ i ( θ, θ ′ ) ≥ − log α FMH i ( θ, θ ′ ) =: λ i ( θ, θ ′ ) we can use the following: Poisson subsampling 1 C ∼ Poisson ( � n i =1 λ i ( θ, θ ′ )) � [ λ i ( θ, θ ′ ) / � n � iid 2 X 1 , . . . , X C i =1 λ i ( θ, θ ′ )] 1 ≤ i ≤ n ∼ Categorical 3 B j ∼ Bernoulli ( λ X j ( θ, θ ′ ) /λ X j ( θ, θ ′ )) for 1 ≤ j ≤ C ⇒ P ( B 1 = · · · = B C = 0) = α FMH ( θ, θ ′ ), so can use this procedure to perform the FMH accept/reject step Cornish et al. Scalable Metropolis–Hastings June 8, 2019 10 / 24

Recommend


More recommend