bayesian optimization for likelihood free inference
play

Bayesian Optimization for Likelihood-Free Inference Michael Gutmann - PowerPoint PPT Presentation

Bayesian Optimization for Likelihood-Free Inference Michael Gutmann https://sites.google.com/site/michaelgutmann University of Edinburgh 14th September 2016 Reference For further information: M.U. Gutmann and J. Corander Bayesian


  1. Bayesian Optimization for Likelihood-Free Inference Michael Gutmann https://sites.google.com/site/michaelgutmann University of Edinburgh 14th September 2016

  2. Reference For further information: M.U. Gutmann and J. Corander Bayesian optimization for likelihood-free inference of simulator-based statistical models Journal of Machine Learning Research , 17(125): 1–47, 2016 J. Lintusaari, M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander Fundamentals and Recent Developments in Approximate Bayesian Computation Systematic Biology , in press, 2016 Michael Gutmann BOLFI 2 / 23

  3. Overall goal ◮ Inference: Given data y o , learn about properties of its source ◮ Enables decision making, predictions, . . . Data space Data source Observation yo Unknown properties Inference Michael Gutmann BOLFI 3 / 23

  4. Approach ◮ Set up a model with potential properties θ (hypotheses) ◮ See which θ are in line with the observed data y o Data space Data source Observation yo Unknown properties M( θ ) Inference Model Michael Gutmann BOLFI 4 / 23

  5. The likelihood function L ( θ ) ◮ Measures agreement between θ and the observed data y o ◮ Probability to generate data like y o if hypothesis θ holds Data space Data source Observation yo Unknown properties ε y| θ Data generation M( θ ) Model Michael Gutmann BOLFI 5 / 23

  6. Performing statistical inference ◮ If L ( θ ) is known, inference is straightforward ◮ Maximum likelihood estimation ˆ θ = argmax θ L ( θ ) ◮ Bayesian inference p ( θ | y ) ∝ p ( θ ) × L ( θ ) posterior ∝ prior × likelihood Allows us to learn from data by updating probabilities Michael Gutmann BOLFI 6 / 23

  7. Likelihood-free inference Statistical inference for models where 1. the likelihood function is too costly to compute 2. sampling – simulating data – from the model is possible Michael Gutmann BOLFI 7 / 23

  8. Importance of likelihood-free inference One reason: Such generative / simulator-based models occur widely ◮ Astrophysics: Simulating the formation of galaxies, stars, or planets ◮ Evolutionary biology: Simulating the evolution of life ◮ Neuroscience: Simulating neural circuits ◮ Computer vision: Simulating natural scenes ◮ Health science: Simulating the spread of an Simulated neural activity in rat somatosensory cortex infectious disease (Figure from https://bbp.epfl.ch/nmc-portal ) ◮ . . . Michael Gutmann BOLFI 8 / 23

  9. Flavors of likelihood-free inference ◮ There are several flavors of likelihood-free inference. In Bayesian setting e.g. ◮ Approximate Bayesian computation (ABC) ◮ Synthetic likelihood (Wood, 2010) ◮ General idea: Identify the values of the parameters of interest θ for which simulated data resemble the observed data ◮ Simulated data resemble the observed data if some distance measure d ≥ 0 is small. Here: Focus on ABC, see JMLR paper for synthetic likelihood Michael Gutmann BOLFI 9 / 23

  10. Meta ABC algorithm ◮ Let y o be the observed data. ◮ Iterate many times: 1. Sample θ from a proposal distribution q ( θ ) 2. Sample y | θ according to the model 3. Compute distance d ( y , y o ) between simulated and observed data 4. Retain θ if d ( y , y o ) ≤ ǫ ◮ Different choices for q ( θ ) give different algorithms ◮ Produces samples from the (approximate) posterior when ǫ is small Michael Gutmann BOLFI 10 / 23

  11. Implicit likelihood approximation Likelihood: Probability to generate data like y o if hypothesis θ holds (1) y θ Data space y θ (2) Model (3) y θ M( θ ) ε y θ (4) yo (5) y θ Likelihood L( θ ) ≈ proportion of (6) y θ green outcomes � � d ( y ( i ) L ( θ ) ≈ 1 � N θ , y o ) ≤ ǫ i =1 ✶ N Michael Gutmann BOLFI 11 / 23

  12. Example: Bacterial infections in child care centers ◮ Likelihood intractable for cross-sectional data ◮ But generating data from the model is possible 5 Parameters of interest: - rate of infections within a center 10 n - rate of infections from outside i a 15 Strain r t - competition between the strains S 20 25 30 5 5 10 15 20 25 30 35 Individual Individual 10 15 Strain 5 20 10 25 30 15 Strain 5 5 10 20 15 20 25 30 35 Individual 10 25 15 Strain 30 20 5 10 15 20 25 30 35 Individual Time 25 30 (Numminen et al, 2013) 5 10 15 20 25 30 35 Individual Michael Gutmann BOLFI 12 / 23

  13. Example: Bacterial infections in child care centers ◮ Data: Streptococcus pneumoniae colonization for 29 centers ◮ Inference with Population Monte Carlo ABC ◮ Reveals strong competition between different bacterial strains 18 prior posterior 16 Expensive: 14 probability density function ◮ 4.5 days on a cluster with 12 10 Competition 200 cores 8 strong weak ◮ More than one million 6 simulated data sets 4 2 0 0 0.2 0.4 0.6 0.8 1 Competition parameter Michael Gutmann BOLFI 13 / 23

  14. Why is the ABC algorithm so expensive? 1. It rejects most samples when ǫ is small 2. It does not make assumptions about the shape of L ( θ ) 3. It does not use all information available 4. It aims at equal accuracy for all parameters 6 Approximate Average distance likelihood function 5 (rescaled) � � d ( y ( i ) L ( θ ) ≈ 1 � N θ , y o ) ≤ ǫ 4 i =1 ✶ N 3 Approximate lik function for competition distances Variability 2 parameter. N = 300. 1 Threshold ε 0 0 0.05 0.1 0.15 0.2 Competition parameter Michael Gutmann BOLFI 14 / 23

  15. Proposed solution (Gutmann and Corander, 2016) 1. It rejects most samples when ǫ is small ⇒ Don’t reject samples – learn from them 2. It does not make assumptions about the shape of L ( θ ) ⇒ Model the distances, assume average distance is smooth 3. It does not use all information available ⇒ Use Bayes’ theorem to update the model 4. It aims at equal accuracy for all parameters ⇒ Prioritize parameter regions with small distances equivalent strategy applies to inference with synthetic likelihood Michael Gutmann BOLFI 15 / 23

  16. Modeling (points 1 & 2) ◮ Data are tuples ( θ i , d i ), where d i = d ( y ( i ) θ , y o ) ◮ Model the conditional distribution of d given θ ◮ Estimated model yields approximation ˆ L ( θ ) for any choice of ǫ ˆ L ( θ ) ∝ � Pr ( d ≤ ǫ | θ ) � Pr is probability under the estimated model. ◮ Here: Use (log) Gaussian process as model (with squared exponential covariance function) ◮ Approach not restricted to Gaussian processes. Michael Gutmann BOLFI 16 / 23

  17. Data acquisition (points 3 & 4) ◮ Samples of θ could be obtained by sampling from the prior or some adaptively constructed proposal distribution ◮ Give priority to regions in the parameter space where distance d tends to be small. ◮ Use Bayesian optimization to find such regions ◮ Here: Use lower confidence bound acquisition function (e.g. Cox and John, 1992; Srinivas et al, 2012) � η 2 A t ( θ ) = µ t ( θ ) − v t ( θ ) (1) t � �� � ���� � �� � post mean post var weight t : number of samples acquired so far ◮ Approach not restricted to this acquisition function. Michael Gutmann BOLFI 17 / 23

  18. Bayesian optimization for likelihood-free inference Model based on 2 data points Model based on 3 data points 5 95% 6 90% 5 80% 4 0 mean 50% 3 distance 2 20% -5 1 10% 0 Acquisition function 5% -10 -1 -2 -15 -3 0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2 Next parameter Competition parameter Competition parameter to try Model based on 4 data points 6 5 Exploration vs exploitation 4 distance 3 Data Model 2 1 0 Bayes' theorem -1 0 0.05 0.1 0.15 0.2 Competition parameter Michael Gutmann BOLFI 18 / 23

  19. Example: Bacterial infections in child care centers ◮ Comparison of the proposed approach with a standard population Monte Carlo ABC approach. ◮ Roughly equal results using 1000 times fewer simulations. Developed Fast Method Standard Method 0.4 0.35 4.5 days with 200 cores Competition parameter 0.3 ↓ 0.25 90 minutes with seven cores 0.2 0.15 Posterior means: solid lines, credibility intervals: shaded areas or dashed lines . 0.1 0.05 2 2.5 3 3.5 4 4.5 5 5.5 6 Computational cost (log10) (Gutmann and Corander, 2016) Michael Gutmann BOLFI 19 / 23

  20. Example: Bacterial infections in child care centers ◮ Comparison of the proposed approach with a standard population Monte Carlo ABC approach. ◮ Roughly equal results using 1000 times fewer simulations. 11 Developed Fast Method 1.8 Developed Fast Method 10 Standard Method Standard Method 1.6 9 Internal infection parameter External infection parameter 8 1.4 7 1.2 6 5 1 4 0.8 3 2 0.6 1 0.4 2 2.5 3 3.5 4 4.5 5 5.5 6 2 2.5 3 3.5 4 4.5 5 5.5 6 Computational cost (log10) Computational cost (log10) Posterior means are shown as solid lines, credibility intervals as shaded areas or dashed lines . Michael Gutmann BOLFI 20 / 23

  21. Further benefits ◮ The proposed method makes the inference more efficient. ◮ Allowed us to perform far more comprehensive data analysis than with standard approach (Numminen et al, 2016) ◮ Enables inference for models which were out of reach till now ◮ model of evolution where simulating a single data set took us 12-24 hours (Marttinen et al, 2015) ◮ Enables easier assessment of parameter identifiability for complex models ◮ model about transmission dynamics of tuberculosis (Lintusaari et al, 2016) Michael Gutmann BOLFI 21 / 23

Recommend


More recommend