an introduction to bayesian optimisation and potential
play

An Introduction to Bayesian Optimisation and (Potential) - PowerPoint PPT Presentation

An Introduction to Bayesian Optimisation and (Potential) Applications in Materials Science Kirthevasan Kandasamy Machine Learning Dept, CMU Electrochemical Energy Symposium Pittsburgh, PA, November 2017 Designing Electrolytes in Batteries


  1. An Introduction to Bayesian Optimisation and (Potential) Applications in Materials Science Kirthevasan Kandasamy Machine Learning Dept, CMU Electrochemical Energy Symposium Pittsburgh, PA, November 2017

  2. Designing Electrolytes in Batteries 1/19

  3. Black-box Optimisation in Computational Astrophysics Cosmological Simulator E.g: Likelihood Hubble Constant Score Baryonic Density Observation Likelihood computation 1/19

  4. Black-box Optimisation Expensive Blackbox Function Other Examples: - Pre-clinical Drug Discovery - Optimal policy in Autonomous Driving - Synthetic gene design 1/19

  5. Black-box Optimisation f : X → R is an expensive, black-box function, accessible only via noisy evaluations. f ( x ) x 2/19

  6. Black-box Optimisation f : X → R is an expensive, black-box function, accessible only via noisy evaluations. f ( x ) x 2/19

  7. Black-box Optimisation f : X → R is an expensive, black-box function, accessible only via noisy evaluations. Let x ⋆ = argmax x f ( x ). f ( x ) f ( x ∗ ) x ∗ x 2/19

  8. Outline ◮ Part I: Bayesian Optimisation ◮ Bayesian Models for f ◮ Two algorithms: upper confidence bounds & Thompson sampling ◮ Part II: Some Modern Challenges ◮ Multi-fidelity Optimisation ◮ Parallelisation 3/19

  9. Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . 4/19

  10. Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Functions with no observations f ( x ) x 4/19

  11. Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Prior GP f ( x ) x 4/19

  12. Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Observations f ( x ) x 4/19

  13. Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Posterior GP given observations f ( x ) x 4/19

  14. Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Posterior GP given observations f ( x ) x f ( x ) ∼ N ( µ t ( x ) , σ 2 After t observations, t ( x ) ). 4/19

  15. Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) f ( x ) x 5/19

  16. Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) f ( x ) x 1) Construct posterior GP . 5/19

  17. Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) ϕ t = µ t − 1 + β 1 / 2 f ( x ) σ t − 1 t x 2) ϕ t = µ t − 1 + β 1 / 2 1) Construct posterior GP . σ t − 1 is a UCB. t 5/19

  18. Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) ϕ t = µ t − 1 + β 1 / 2 f ( x ) σ t − 1 t x t x 2) ϕ t = µ t − 1 + β 1 / 2 1) Construct posterior GP . σ t − 1 is a UCB. t 3) Choose x t = argmax x ϕ t ( x ). 5/19

  19. Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) ϕ t = µ t − 1 + β 1 / 2 f ( x ) σ t − 1 t x t x 2) ϕ t = µ t − 1 + β 1 / 2 1) Construct posterior GP . σ t − 1 is a UCB. t 3) Choose x t = argmax x ϕ t ( x ). 4) Evaluate f at x t . 5/19

  20. GP-UCB (Srinivas et al. 2010) f ( x ) x 6/19

  21. GP-UCB (Srinivas et al. 2010) f ( x ) t = 1 x 6/19

  22. GP-UCB (Srinivas et al. 2010) f ( x ) t = 2 x 6/19

  23. GP-UCB (Srinivas et al. 2010) f ( x ) t = 3 x 6/19

  24. GP-UCB (Srinivas et al. 2010) f ( x ) t = 4 x 6/19

  25. GP-UCB (Srinivas et al. 2010) f ( x ) t = 5 x 6/19

  26. GP-UCB (Srinivas et al. 2010) f ( x ) t = 6 x 6/19

  27. GP-UCB (Srinivas et al. 2010) f ( x ) t = 7 x 6/19

  28. GP-UCB (Srinivas et al. 2010) f ( x ) t = 11 x 6/19

  29. GP-UCB (Srinivas et al. 2010) f ( x ) t = 25 x 6/19

  30. Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 7/19

  31. Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 1) Construct posterior GP . 7/19

  32. Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 1) Construct posterior GP . 2) Draw sample g from posterior. 7/19

  33. Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x t x 1) Construct posterior GP . 2) Draw sample g from posterior. 3) Choose x t = argmax x g ( x ). 7/19

  34. Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x t x 1) Construct posterior GP . 2) Draw sample g from posterior. 3) Choose x t = argmax x g ( x ). 4) Evaluate f at x t . 7/19

  35. More on Bayesian Optimisation Theoretical results: Both UCB and TS will eventually find the optimum under certain smoothness assumptions of f . 8/19

  36. More on Bayesian Optimisation Theoretical results: Both UCB and TS will eventually find the optimum under certain smoothness assumptions of f . Other criteria for selecting x t : ◮ Expected improvement (Jones et al. 1998) ◮ Probability of improvement (Kushner et al. 1964) ◮ Predictive entropy search (Hern´ andez-Lobato et al. 2014) ◮ Information directed sampling (Russo & Van Roy 2014) 8/19

  37. More on Bayesian Optimisation Theoretical results: Both UCB and TS will eventually find the optimum under certain smoothness assumptions of f . Other criteria for selecting x t : ◮ Expected improvement (Jones et al. 1998) ◮ Probability of improvement (Kushner et al. 1964) ◮ Predictive entropy search (Hern´ andez-Lobato et al. 2014) ◮ Information directed sampling (Russo & Van Roy 2014) Other Bayesian models for f : ◮ Neural networks (Snoek et al. 2015) ◮ Random Forests (Hutter 2009) 8/19

  38. Some Modern Challenges/Opportunities 1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) 2. Parallelisation (Kandasamy et al. Arxiv 2017) 9/19

  39. 1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) Desired function f is very expensive, but . . . we have access to cheap approximations. f x ⋆ 10/19

  40. 1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) Desired function f is very expensive, but . . . we have access to cheap approximations. f 1 f 2 f 1 , f 2 , f 3 ≈ f which are cheaper to evaluate. f 3 f x ⋆ 10/19

  41. 1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) Desired function f is very expensive, but . . . we have access to cheap approximations. f 1 f 2 f 1 , f 2 , f 3 ≈ f which are cheaper to evaluate. f 3 f x ⋆ E.g. f : a real world battery experiment f 2 : lab experiment f 1 : computer simulation 10/19

  42. MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound With 2 fidelities (1 Approximation), t = 14 f (2) f (1) x ⋆ x t 11/19

  43. MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound With 2 fidelities (1 Approximation), t = 14 f (2) f (1) x ⋆ x t Theorem: MF-GP-UCB finds the optimum x ⋆ with less resources than GP-UCB on f (2) . 11/19

  44. MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound With 2 fidelities (1 Approximation), t = 14 f (2) f (1) x ⋆ x t Theorem: MF-GP-UCB finds the optimum x ⋆ with less resources than GP-UCB on f (2) . Can be extended to multiple approximations and continuous approximations. 11/19

  45. Experiment: Cosmological Maximum Likelihood Inference ◮ Type Ia Supernovae Data ◮ Maximum likelihood inference for 3 cosmological parameters: ◮ Hubble Constant H 0 ◮ Dark Energy Fraction Ω Λ ◮ Dark Matter Fraction Ω M ◮ Likelihood: Robertson Walker metric (Robertson 1936) Requires numerical integration for each point in the dataset. 12/19

  46. Experiment: Cosmological Maximum Likelihood Inference 3 cosmological parameters. ( d = 3) Fidelities: integration on grids of size (10 2 , 10 4 , 10 6 ). ( M = 3) 10 5 0 -5 -10 500 1000 1500 2000 2500 3000 3500 13/19

  47. Experiment: Hartmann-3 D 2 Approximations (3 fidelities). We want to optimise the m = 3 rd fidelity, which is the most expensive. m = 1 st fidelity is cheapest. Query frequencies for Hartmann-3D 40 m=1 m=2 35 m=3 Num. of Queries 30 25 20 15 10 5 0 0 0.5 1 1.5 2 2.5 3 3.5 f (3) ( x ) 14/19

  48. 2. Parallelising function evaluations Parallelisation with M workers: can evaluate f at M different points at the same time. E.g.: Test M different battery solvents at the same time. 15/19

  49. 2. Parallelising function evaluations Parallelisation with M workers: can evaluate f at M different points at the same time. E.g.: Test M different battery solvents at the same time. Sequential evaluations with one worker 15/19

Recommend


More recommend