An Introduction to Bayesian Optimisation and (Potential) Applications in Materials Science Kirthevasan Kandasamy Machine Learning Dept, CMU Electrochemical Energy Symposium Pittsburgh, PA, November 2017
Designing Electrolytes in Batteries 1/19
Black-box Optimisation in Computational Astrophysics Cosmological Simulator E.g: Likelihood Hubble Constant Score Baryonic Density Observation Likelihood computation 1/19
Black-box Optimisation Expensive Blackbox Function Other Examples: - Pre-clinical Drug Discovery - Optimal policy in Autonomous Driving - Synthetic gene design 1/19
Black-box Optimisation f : X → R is an expensive, black-box function, accessible only via noisy evaluations. f ( x ) x 2/19
Black-box Optimisation f : X → R is an expensive, black-box function, accessible only via noisy evaluations. f ( x ) x 2/19
Black-box Optimisation f : X → R is an expensive, black-box function, accessible only via noisy evaluations. Let x ⋆ = argmax x f ( x ). f ( x ) f ( x ∗ ) x ∗ x 2/19
Outline ◮ Part I: Bayesian Optimisation ◮ Bayesian Models for f ◮ Two algorithms: upper confidence bounds & Thompson sampling ◮ Part II: Some Modern Challenges ◮ Multi-fidelity Optimisation ◮ Parallelisation 3/19
Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . 4/19
Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Functions with no observations f ( x ) x 4/19
Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Prior GP f ( x ) x 4/19
Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Observations f ( x ) x 4/19
Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Posterior GP given observations f ( x ) x 4/19
Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Posterior GP given observations f ( x ) x f ( x ) ∼ N ( µ t ( x ) , σ 2 After t observations, t ( x ) ). 4/19
Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) f ( x ) x 5/19
Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) f ( x ) x 1) Construct posterior GP . 5/19
Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) ϕ t = µ t − 1 + β 1 / 2 f ( x ) σ t − 1 t x 2) ϕ t = µ t − 1 + β 1 / 2 1) Construct posterior GP . σ t − 1 is a UCB. t 5/19
Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) ϕ t = µ t − 1 + β 1 / 2 f ( x ) σ t − 1 t x t x 2) ϕ t = µ t − 1 + β 1 / 2 1) Construct posterior GP . σ t − 1 is a UCB. t 3) Choose x t = argmax x ϕ t ( x ). 5/19
Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) ϕ t = µ t − 1 + β 1 / 2 f ( x ) σ t − 1 t x t x 2) ϕ t = µ t − 1 + β 1 / 2 1) Construct posterior GP . σ t − 1 is a UCB. t 3) Choose x t = argmax x ϕ t ( x ). 4) Evaluate f at x t . 5/19
GP-UCB (Srinivas et al. 2010) f ( x ) x 6/19
GP-UCB (Srinivas et al. 2010) f ( x ) t = 1 x 6/19
GP-UCB (Srinivas et al. 2010) f ( x ) t = 2 x 6/19
GP-UCB (Srinivas et al. 2010) f ( x ) t = 3 x 6/19
GP-UCB (Srinivas et al. 2010) f ( x ) t = 4 x 6/19
GP-UCB (Srinivas et al. 2010) f ( x ) t = 5 x 6/19
GP-UCB (Srinivas et al. 2010) f ( x ) t = 6 x 6/19
GP-UCB (Srinivas et al. 2010) f ( x ) t = 7 x 6/19
GP-UCB (Srinivas et al. 2010) f ( x ) t = 11 x 6/19
GP-UCB (Srinivas et al. 2010) f ( x ) t = 25 x 6/19
Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 7/19
Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 1) Construct posterior GP . 7/19
Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 1) Construct posterior GP . 2) Draw sample g from posterior. 7/19
Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x t x 1) Construct posterior GP . 2) Draw sample g from posterior. 3) Choose x t = argmax x g ( x ). 7/19
Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x t x 1) Construct posterior GP . 2) Draw sample g from posterior. 3) Choose x t = argmax x g ( x ). 4) Evaluate f at x t . 7/19
More on Bayesian Optimisation Theoretical results: Both UCB and TS will eventually find the optimum under certain smoothness assumptions of f . 8/19
More on Bayesian Optimisation Theoretical results: Both UCB and TS will eventually find the optimum under certain smoothness assumptions of f . Other criteria for selecting x t : ◮ Expected improvement (Jones et al. 1998) ◮ Probability of improvement (Kushner et al. 1964) ◮ Predictive entropy search (Hern´ andez-Lobato et al. 2014) ◮ Information directed sampling (Russo & Van Roy 2014) 8/19
More on Bayesian Optimisation Theoretical results: Both UCB and TS will eventually find the optimum under certain smoothness assumptions of f . Other criteria for selecting x t : ◮ Expected improvement (Jones et al. 1998) ◮ Probability of improvement (Kushner et al. 1964) ◮ Predictive entropy search (Hern´ andez-Lobato et al. 2014) ◮ Information directed sampling (Russo & Van Roy 2014) Other Bayesian models for f : ◮ Neural networks (Snoek et al. 2015) ◮ Random Forests (Hutter 2009) 8/19
Some Modern Challenges/Opportunities 1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) 2. Parallelisation (Kandasamy et al. Arxiv 2017) 9/19
1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) Desired function f is very expensive, but . . . we have access to cheap approximations. f x ⋆ 10/19
1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) Desired function f is very expensive, but . . . we have access to cheap approximations. f 1 f 2 f 1 , f 2 , f 3 ≈ f which are cheaper to evaluate. f 3 f x ⋆ 10/19
1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) Desired function f is very expensive, but . . . we have access to cheap approximations. f 1 f 2 f 1 , f 2 , f 3 ≈ f which are cheaper to evaluate. f 3 f x ⋆ E.g. f : a real world battery experiment f 2 : lab experiment f 1 : computer simulation 10/19
MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound With 2 fidelities (1 Approximation), t = 14 f (2) f (1) x ⋆ x t 11/19
MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound With 2 fidelities (1 Approximation), t = 14 f (2) f (1) x ⋆ x t Theorem: MF-GP-UCB finds the optimum x ⋆ with less resources than GP-UCB on f (2) . 11/19
MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound With 2 fidelities (1 Approximation), t = 14 f (2) f (1) x ⋆ x t Theorem: MF-GP-UCB finds the optimum x ⋆ with less resources than GP-UCB on f (2) . Can be extended to multiple approximations and continuous approximations. 11/19
Experiment: Cosmological Maximum Likelihood Inference ◮ Type Ia Supernovae Data ◮ Maximum likelihood inference for 3 cosmological parameters: ◮ Hubble Constant H 0 ◮ Dark Energy Fraction Ω Λ ◮ Dark Matter Fraction Ω M ◮ Likelihood: Robertson Walker metric (Robertson 1936) Requires numerical integration for each point in the dataset. 12/19
Experiment: Cosmological Maximum Likelihood Inference 3 cosmological parameters. ( d = 3) Fidelities: integration on grids of size (10 2 , 10 4 , 10 6 ). ( M = 3) 10 5 0 -5 -10 500 1000 1500 2000 2500 3000 3500 13/19
Experiment: Hartmann-3 D 2 Approximations (3 fidelities). We want to optimise the m = 3 rd fidelity, which is the most expensive. m = 1 st fidelity is cheapest. Query frequencies for Hartmann-3D 40 m=1 m=2 35 m=3 Num. of Queries 30 25 20 15 10 5 0 0 0.5 1 1.5 2 2.5 3 3.5 f (3) ( x ) 14/19
2. Parallelising function evaluations Parallelisation with M workers: can evaluate f at M different points at the same time. E.g.: Test M different battery solvents at the same time. 15/19
2. Parallelising function evaluations Parallelisation with M workers: can evaluate f at M different points at the same time. E.g.: Test M different battery solvents at the same time. Sequential evaluations with one worker 15/19
Recommend
More recommend