Bayesian Optimization for Likelihood-Free Inference Michael Gutmann - PowerPoint PPT Presentation

Bayesian Optimization for Likelihood-Free Inference Michael Gutmann https://sites.google.com/site/michaelgutmann University of Edinburgh 14th September 2016

Reference For further information: M.U. Gutmann and J. Corander Bayesian optimization for likelihood-free inference of simulator-based statistical models Journal of Machine Learning Research , 17(125): 1–47, 2016 J. Lintusaari, M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander Fundamentals and Recent Developments in Approximate Bayesian Computation Systematic Biology , in press, 2016 Michael Gutmann BOLFI 2 / 23

Overall goal ◮ Inference: Given data y o , learn about properties of its source ◮ Enables decision making, predictions, . . . Data space Data source Observation yo Unknown properties Inference Michael Gutmann BOLFI 3 / 23

Approach ◮ Set up a model with potential properties θ (hypotheses) ◮ See which θ are in line with the observed data y o Data space Data source Observation yo Unknown properties M( θ ) Inference Model Michael Gutmann BOLFI 4 / 23

The likelihood function L ( θ ) ◮ Measures agreement between θ and the observed data y o ◮ Probability to generate data like y o if hypothesis θ holds Data space Data source Observation yo Unknown properties ε y| θ Data generation M( θ ) Model Michael Gutmann BOLFI 5 / 23

Performing statistical inference ◮ If L ( θ ) is known, inference is straightforward ◮ Maximum likelihood estimation ˆ θ = argmax θ L ( θ ) ◮ Bayesian inference p ( θ | y ) ∝ p ( θ ) × L ( θ ) posterior ∝ prior × likelihood Allows us to learn from data by updating probabilities Michael Gutmann BOLFI 6 / 23

Likelihood-free inference Statistical inference for models where 1. the likelihood function is too costly to compute 2. sampling – simulating data – from the model is possible Michael Gutmann BOLFI 7 / 23

Importance of likelihood-free inference One reason: Such generative / simulator-based models occur widely ◮ Astrophysics: Simulating the formation of galaxies, stars, or planets ◮ Evolutionary biology: Simulating the evolution of life ◮ Neuroscience: Simulating neural circuits ◮ Computer vision: Simulating natural scenes ◮ Health science: Simulating the spread of an Simulated neural activity in rat somatosensory cortex infectious disease (Figure from https://bbp.epfl.ch/nmc-portal ) ◮ . . . Michael Gutmann BOLFI 8 / 23

Flavors of likelihood-free inference ◮ There are several flavors of likelihood-free inference. In Bayesian setting e.g. ◮ Approximate Bayesian computation (ABC) ◮ Synthetic likelihood (Wood, 2010) ◮ General idea: Identify the values of the parameters of interest θ for which simulated data resemble the observed data ◮ Simulated data resemble the observed data if some distance measure d ≥ 0 is small. Here: Focus on ABC, see JMLR paper for synthetic likelihood Michael Gutmann BOLFI 9 / 23

Meta ABC algorithm ◮ Let y o be the observed data. ◮ Iterate many times: 1. Sample θ from a proposal distribution q ( θ ) 2. Sample y | θ according to the model 3. Compute distance d ( y , y o ) between simulated and observed data 4. Retain θ if d ( y , y o ) ≤ ǫ ◮ Different choices for q ( θ ) give different algorithms ◮ Produces samples from the (approximate) posterior when ǫ is small Michael Gutmann BOLFI 10 / 23

Implicit likelihood approximation Likelihood: Probability to generate data like y o if hypothesis θ holds (1) y θ Data space y θ (2) Model (3) y θ M( θ ) ε y θ (4) yo (5) y θ Likelihood L( θ ) ≈ proportion of (6) y θ green outcomes � � d ( y ( i ) L ( θ ) ≈ 1 � N θ , y o ) ≤ ǫ i =1 ✶ N Michael Gutmann BOLFI 11 / 23

Example: Bacterial infections in child care centers ◮ Likelihood intractable for cross-sectional data ◮ But generating data from the model is possible 5 Parameters of interest: - rate of infections within a center 10 n - rate of infections from outside i a 15 Strain r t - competition between the strains S 20 25 30 5 5 10 15 20 25 30 35 Individual Individual 10 15 Strain 5 20 10 25 30 15 Strain 5 5 10 20 15 20 25 30 35 Individual 10 25 15 Strain 30 20 5 10 15 20 25 30 35 Individual Time 25 30 (Numminen et al, 2013) 5 10 15 20 25 30 35 Individual Michael Gutmann BOLFI 12 / 23

Example: Bacterial infections in child care centers ◮ Data: Streptococcus pneumoniae colonization for 29 centers ◮ Inference with Population Monte Carlo ABC ◮ Reveals strong competition between different bacterial strains 18 prior posterior 16 Expensive: 14 probability density function ◮ 4.5 days on a cluster with 12 10 Competition 200 cores 8 strong weak ◮ More than one million 6 simulated data sets 4 2 0 0 0.2 0.4 0.6 0.8 1 Competition parameter Michael Gutmann BOLFI 13 / 23

Why is the ABC algorithm so expensive? 1. It rejects most samples when ǫ is small 2. It does not make assumptions about the shape of L ( θ ) 3. It does not use all information available 4. It aims at equal accuracy for all parameters 6 Approximate Average distance likelihood function 5 (rescaled) � � d ( y ( i ) L ( θ ) ≈ 1 � N θ , y o ) ≤ ǫ 4 i =1 ✶ N 3 Approximate lik function for competition distances Variability 2 parameter. N = 300. 1 Threshold ε 0 0 0.05 0.1 0.15 0.2 Competition parameter Michael Gutmann BOLFI 14 / 23

Proposed solution (Gutmann and Corander, 2016) 1. It rejects most samples when ǫ is small ⇒ Don’t reject samples – learn from them 2. It does not make assumptions about the shape of L ( θ ) ⇒ Model the distances, assume average distance is smooth 3. It does not use all information available ⇒ Use Bayes’ theorem to update the model 4. It aims at equal accuracy for all parameters ⇒ Prioritize parameter regions with small distances equivalent strategy applies to inference with synthetic likelihood Michael Gutmann BOLFI 15 / 23

Modeling (points 1 & 2) ◮ Data are tuples ( θ i , d i ), where d i = d ( y ( i ) θ , y o ) ◮ Model the conditional distribution of d given θ ◮ Estimated model yields approximation ˆ L ( θ ) for any choice of ǫ ˆ L ( θ ) ∝ � Pr ( d ≤ ǫ | θ ) � Pr is probability under the estimated model. ◮ Here: Use (log) Gaussian process as model (with squared exponential covariance function) ◮ Approach not restricted to Gaussian processes. Michael Gutmann BOLFI 16 / 23

Data acquisition (points 3 & 4) ◮ Samples of θ could be obtained by sampling from the prior or some adaptively constructed proposal distribution ◮ Give priority to regions in the parameter space where distance d tends to be small. ◮ Use Bayesian optimization to find such regions ◮ Here: Use lower confidence bound acquisition function (e.g. Cox and John, 1992; Srinivas et al, 2012) � η 2 A t ( θ ) = µ t ( θ ) − v t ( θ ) (1) t � �� post mean post var weight t : number of samples acquired so far ◮ Approach not restricted to this acquisition function. Michael Gutmann BOLFI 17 / 23

Bayesian optimization for likelihood-free inference Model based on 2 data points Model based on 3 data points 5 95% 6 90% 5 80% 4 0 mean 50% 3 distance 2 20% -5 1 10% 0 Acquisition function 5% -10 -1 -2 -15 -3 0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2 Next parameter Competition parameter Competition parameter to try Model based on 4 data points 6 5 Exploration vs exploitation 4 distance 3 Data Model 2 1 0 Bayes' theorem -1 0 0.05 0.1 0.15 0.2 Competition parameter Michael Gutmann BOLFI 18 / 23

Example: Bacterial infections in child care centers ◮ Comparison of the proposed approach with a standard population Monte Carlo ABC approach. ◮ Roughly equal results using 1000 times fewer simulations. Developed Fast Method Standard Method 0.4 0.35 4.5 days with 200 cores Competition parameter 0.3 ↓ 0.25 90 minutes with seven cores 0.2 0.15 Posterior means: solid lines, credibility intervals: shaded areas or dashed lines . 0.1 0.05 2 2.5 3 3.5 4 4.5 5 5.5 6 Computational cost (log10) (Gutmann and Corander, 2016) Michael Gutmann BOLFI 19 / 23

Example: Bacterial infections in child care centers ◮ Comparison of the proposed approach with a standard population Monte Carlo ABC approach. ◮ Roughly equal results using 1000 times fewer simulations. 11 Developed Fast Method 1.8 Developed Fast Method 10 Standard Method Standard Method 1.6 9 Internal infection parameter External infection parameter 8 1.4 7 1.2 6 5 1 4 0.8 3 2 0.6 1 0.4 2 2.5 3 3.5 4 4.5 5 5.5 6 2 2.5 3 3.5 4 4.5 5 5.5 6 Computational cost (log10) Computational cost (log10) Posterior means are shown as solid lines, credibility intervals as shaded areas or dashed lines . Michael Gutmann BOLFI 20 / 23

Further benefits ◮ The proposed method makes the inference more efficient. ◮ Allowed us to perform far more comprehensive data analysis than with standard approach (Numminen et al, 2016) ◮ Enables inference for models which were out of reach till now ◮ model of evolution where simulating a single data set took us 12-24 hours (Marttinen et al, 2015) ◮ Enables easier assessment of parameter identifiability for complex models ◮ model about transmission dynamics of tuberculosis (Lintusaari et al, 2016) Michael Gutmann BOLFI 21 / 23

Bayesian Optimization for Likelihood-Free Inference Michael Gutmann - PowerPoint PPT Presentation

Bayesian Optimization for Likelihood-Free Inference Michael Gutmann https://sites.google.com/site/michaelgutmann University of Edinburgh 14th September 2016 Reference For further information: M.U. Gutmann and J. Corander Bayesian

Recurrent machines for likelihood-free inference Arthur Pesah Antoine Wehenkel Gilles Louppe

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Likelihood inference in complex settings Nancy Reid with Uyen Hoang, Wei Lin, Ximing Xu 1 / 30

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

On Computational and Probabilistic Inference Rajat Mani Thomas Objectives: Revisiting Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

EST5104 Bayesian Inference EST5803 Advanced Bayesian Inference Ricardo Ehlers ehlers@icmc.usp.br

Machine Learning: Foundations Lecturer: Yishay Mansour Lecture 2 Bayesian Inference Kfir Bar

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Welcome Jamie Cahn, Meet Referee Welcome and Introductions Note schedule changes!

April 26, 2012 Forward-Looking Statements / Safe Harbor This presentation contains a number of

Is EBITDA a Useful Metric? EBITDA: Earnings Before Interest, Taxes, Depreciation &

Raman scattering at terahertz frequencies enabled by an infrared free electron laser S.G. Pavlov 1

John P. Rathbone Executive Vice President Finance and Chief Financial Officer Operating Results

Leveraging Best of Breed Expertise for Business Growth John Breitenfeld Percona, EVP Global

BUSINESS Delivered by: Sithari Herath BSc. Fin(sp) (USJP), ACCA, CIMA Economics for Business -

Free-Fall Timescale of Sun Free-fall timescale: The time it would take a star (or cloud) to

Bayesian Optimization for Likelihood-Free Inference Michael Gutmann - PowerPoint PPT Presentation

Bayesian Optimization for Likelihood-Free Inference Michael Gutmann https://sites.google.com/site/michaelgutmann University of Edinburgh 14th September 2016 Reference For further information: M.U. Gutmann and J. Corander Bayesian

Recurrent machines for likelihood-free inference Arthur Pesah Antoine Wehenkel Gilles Louppe

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Max. likelihood &amp; Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Likelihood inference in complex settings Nancy Reid with Uyen Hoang, Wei Lin, Ximing Xu 1 / 30

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

On Computational and Probabilistic Inference Rajat Mani Thomas Objectives: Revisiting Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

EST5104 Bayesian Inference EST5803 Advanced Bayesian Inference Ricardo Ehlers ehlers@icmc.usp.br

Machine Learning: Foundations Lecturer: Yishay Mansour Lecture 2 Bayesian Inference Kfir Bar

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Welcome Jamie Cahn, Meet Referee Welcome and Introductions Note schedule changes!

April 26, 2012 Forward-Looking Statements / Safe Harbor This presentation contains a number of

Is EBITDA a Useful Metric? EBITDA: Earnings Before Interest, Taxes, Depreciation &amp;

Raman scattering at terahertz frequencies enabled by an infrared free electron laser S.G. Pavlov 1

John P. Rathbone Executive Vice President Finance and Chief Financial Officer Operating Results

Leveraging Best of Breed Expertise for Business Growth John Breitenfeld Percona, EVP Global

BUSINESS Delivered by: Sithari Herath BSc. Fin(sp) (USJP), ACCA, CIMA Economics for Business -

Free-Fall Timescale of Sun Free-fall timescale: The time it would take a star (or cloud) to

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Is EBITDA a Useful Metric? EBITDA: Earnings Before Interest, Taxes, Depreciation &