An Introduction to Bayesian Optimisation and (Potential) - PowerPoint PPT Presentation

An Introduction to Bayesian Optimisation and (Potential) Applications in Materials Science Kirthevasan Kandasamy Machine Learning Dept, CMU Electrochemical Energy Symposium Pittsburgh, PA, November 2017

Designing Electrolytes in Batteries 1/19

Black-box Optimisation in Computational Astrophysics Cosmological Simulator E.g: Likelihood Hubble Constant Score Baryonic Density Observation Likelihood computation 1/19

Black-box Optimisation Expensive Blackbox Function Other Examples: - Pre-clinical Drug Discovery - Optimal policy in Autonomous Driving - Synthetic gene design 1/19

Black-box Optimisation f : X → R is an expensive, black-box function, accessible only via noisy evaluations. f ( x ) x 2/19

Black-box Optimisation f : X → R is an expensive, black-box function, accessible only via noisy evaluations. Let x ⋆ = argmax x f ( x ). f ( x ) f ( x ∗ ) x ∗ x 2/19

Outline ◮ Part I: Bayesian Optimisation ◮ Bayesian Models for f ◮ Two algorithms: upper confidence bounds & Thompson sampling ◮ Part II: Some Modern Challenges ◮ Multi-fidelity Optimisation ◮ Parallelisation 3/19

Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . 4/19

Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Functions with no observations f ( x ) x 4/19

Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Prior GP f ( x ) x 4/19

Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Observations f ( x ) x 4/19

Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Posterior GP given observations f ( x ) x 4/19

Bayesian Models for f e.g. Gaussian Processes ( GP ) GP : A distribution over functions from X to R . Posterior GP given observations f ( x ) x f ( x ) ∼ N ( µ t ( x ) , σ 2 After t observations, t ( x ) ). 4/19

Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) f ( x ) x 5/19

Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) f ( x ) x 1) Construct posterior GP . 5/19

Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) ϕ t = µ t − 1 + β 1 / 2 f ( x ) σ t − 1 t x 2) ϕ t = µ t − 1 + β 1 / 2 1) Construct posterior GP . σ t − 1 is a UCB. t 5/19

Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) ϕ t = µ t − 1 + β 1 / 2 f ( x ) σ t − 1 t x t x 2) ϕ t = µ t − 1 + β 1 / 2 1) Construct posterior GP . σ t − 1 is a UCB. t 3) Choose x t = argmax x ϕ t ( x ). 5/19

Bayesian Optimisation with Upper Confidence Bounds Model f ∼ GP . Gaussian Process Upper Confidence Bound ( GP-UCB ) (Srinivas et al. 2010) ϕ t = µ t − 1 + β 1 / 2 f ( x ) σ t − 1 t x t x 2) ϕ t = µ t − 1 + β 1 / 2 1) Construct posterior GP . σ t − 1 is a UCB. t 3) Choose x t = argmax x ϕ t ( x ). 4) Evaluate f at x t . 5/19

GP-UCB (Srinivas et al. 2010) f ( x ) x 6/19

GP-UCB (Srinivas et al. 2010) f ( x ) t = 1 x 6/19

Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 7/19

Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 1) Construct posterior GP . 7/19

Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x 1) Construct posterior GP . 2) Draw sample g from posterior. 7/19

Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x t x 1) Construct posterior GP . 2) Draw sample g from posterior. 3) Choose x t = argmax x g ( x ). 7/19

Bayesian Optimisation with Thompson Sampling Model f ∼ GP ( 0 , κ ). Thompson Sampling (TS) (Thompson, 1933) . f ( x ) x t x 1) Construct posterior GP . 2) Draw sample g from posterior. 3) Choose x t = argmax x g ( x ). 4) Evaluate f at x t . 7/19

More on Bayesian Optimisation Theoretical results: Both UCB and TS will eventually find the optimum under certain smoothness assumptions of f . 8/19

More on Bayesian Optimisation Theoretical results: Both UCB and TS will eventually find the optimum under certain smoothness assumptions of f . Other criteria for selecting x t : ◮ Expected improvement (Jones et al. 1998) ◮ Probability of improvement (Kushner et al. 1964) ◮ Predictive entropy search (Hern´ andez-Lobato et al. 2014) ◮ Information directed sampling (Russo & Van Roy 2014) 8/19

More on Bayesian Optimisation Theoretical results: Both UCB and TS will eventually find the optimum under certain smoothness assumptions of f . Other criteria for selecting x t : ◮ Expected improvement (Jones et al. 1998) ◮ Probability of improvement (Kushner et al. 1964) ◮ Predictive entropy search (Hern´ andez-Lobato et al. 2014) ◮ Information directed sampling (Russo & Van Roy 2014) Other Bayesian models for f : ◮ Neural networks (Snoek et al. 2015) ◮ Random Forests (Hutter 2009) 8/19

Some Modern Challenges/Opportunities 1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) 2. Parallelisation (Kandasamy et al. Arxiv 2017) 9/19

1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) Desired function f is very expensive, but . . . we have access to cheap approximations. f x ⋆ 10/19

1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) Desired function f is very expensive, but . . . we have access to cheap approximations. f 1 f 2 f 1 , f 2 , f 3 ≈ f which are cheaper to evaluate. f 3 f x ⋆ 10/19

1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017) Desired function f is very expensive, but . . . we have access to cheap approximations. f 1 f 2 f 1 , f 2 , f 3 ≈ f which are cheaper to evaluate. f 3 f x ⋆ E.g. f : a real world battery experiment f 2 : lab experiment f 1 : computer simulation 10/19

MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound With 2 fidelities (1 Approximation), t = 14 f (2) f (1) x ⋆ x t 11/19

MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound With 2 fidelities (1 Approximation), t = 14 f (2) f (1) x ⋆ x t Theorem: MF-GP-UCB finds the optimum x ⋆ with less resources than GP-UCB on f (2) . 11/19

MF-GP-UCB (Kandasamy et al. NIPS 2016b) Multi-fidelity Gaussian Process Upper Confidence Bound With 2 fidelities (1 Approximation), t = 14 f (2) f (1) x ⋆ x t Theorem: MF-GP-UCB finds the optimum x ⋆ with less resources than GP-UCB on f (2) . Can be extended to multiple approximations and continuous approximations. 11/19

Experiment: Cosmological Maximum Likelihood Inference ◮ Type Ia Supernovae Data ◮ Maximum likelihood inference for 3 cosmological parameters: ◮ Hubble Constant H 0 ◮ Dark Energy Fraction Ω Λ ◮ Dark Matter Fraction Ω M ◮ Likelihood: Robertson Walker metric (Robertson 1936) Requires numerical integration for each point in the dataset. 12/19

Experiment: Cosmological Maximum Likelihood Inference 3 cosmological parameters. ( d = 3) Fidelities: integration on grids of size (10 2 , 10 4 , 10 6 ). ( M = 3) 10 5 0 -5 -10 500 1000 1500 2000 2500 3000 3500 13/19

Experiment: Hartmann-3 D 2 Approximations (3 fidelities). We want to optimise the m = 3 rd fidelity, which is the most expensive. m = 1 st fidelity is cheapest. Query frequencies for Hartmann-3D 40 m=1 m=2 35 m=3 Num. of Queries 30 25 20 15 10 5 0 0 0.5 1 1.5 2 2.5 3 3.5 f (3) ( x ) 14/19

2. Parallelising function evaluations Parallelisation with M workers: can evaluate f at M different points at the same time. E.g.: Test M different battery solvents at the same time. 15/19

2. Parallelising function evaluations Parallelisation with M workers: can evaluate f at M different points at the same time. E.g.: Test M different battery solvents at the same time. Sequential evaluations with one worker 15/19

An Introduction to Bayesian Optimisation and (Potential) - PowerPoint PPT Presentation

An Introduction to Bayesian Optimisation and (Potential) Applications in Materials Science Kirthevasan Kandasamy Machine Learning Dept, CMU Electrochemical Energy Symposium Pittsburgh, PA, November 2017 Designing Electrolytes in Batteries

Medicines optimisation The road to excellence Workshop Overview of meds optimisation Your

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Introduction to program optimisation Michel Schinz (based on Erik Stenmans slides) Advanced

Automated and Accurate Geometry Extraction and Shape Optimisation of 3D Topology Optimisation

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kanda samy ,

Parallelised Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy Akshay Jeff

Pressure Optimisation Introduction Why carry out Pressure Optimisation How and Who

Optimisation Constraint Problems Combinatorial Optimisation Modelling (in MiniZinc) Solving

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Materials Innovation for Next-Generation Batteries Corsin Battaglia SCCER HaE Annual Symposium

8#@%-,1A.,1"#("$(B%%C%'D(%$$%',(1#( '"5+2%E(21FG1(%2%',&"26,%(C.-%(

10-12-2019 Motivation Dairy cow mastitis is a serious problem Financial loss Animal

Syllabus CSCI103: Electronic Textiles Tuesdays & Thursdays; 9:55-11:10am; TCL217

Residential Developments Presentation by the PAs to the EEAC September 2019 Agenda 1:25

Delaware Electric Cooperative Efficiency, Conservation, Demand Side Management Efficiency &

Thermo Attacks - On Temperature Side Channels and Heating Faults Michael Hutter and J orn-Marc

Electron and Ion Sources Layout Electron Sources Thermionic Photo-Cathodes

An Introduction to Bayesian Optimisation and (Potential) - PowerPoint PPT Presentation

An Introduction to Bayesian Optimisation and (Potential) Applications in Materials Science Kirthevasan Kandasamy Machine Learning Dept, CMU Electrochemical Energy Symposium Pittsburgh, PA, November 2017 Designing Electrolytes in Batteries

Medicines optimisation The road to excellence Workshop Overview of meds optimisation Your

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Introduction to program optimisation Michel Schinz (based on Erik Stenmans slides) Advanced

Automated and Accurate Geometry Extraction and Shape Optimisation of 3D Topology Optimisation

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kanda samy ,

Parallelised Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy Akshay Jeff

Pressure Optimisation Introduction Why carry out Pressure Optimisation How and Who

Optimisation Constraint Problems Combinatorial Optimisation Modelling (in MiniZinc) Solving

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Materials Innovation for Next-Generation Batteries Corsin Battaglia SCCER HaE Annual Symposium

8#@%-,1A.,1&quot;#(&quot;$(B%%C%'D(%$$%',(1#( '&quot;5+2%E(21FG1*(%2%',&amp;&quot;26,%(C.-%*(

10-12-2019 Motivation Dairy cow mastitis is a serious problem Financial loss Animal

Syllabus CSCI103: Electronic Textiles Tuesdays &amp; Thursdays; 9:55-11:10am; TCL217

Residential Developments Presentation by the PAs to the EEAC September 2019 Agenda 1:25

Delaware Electric Cooperative Efficiency, Conservation, Demand Side Management Efficiency &amp;

Thermo Attacks - On Temperature Side Channels and Heating Faults Michael Hutter and J orn-Marc

Electron and Ion Sources Layout Electron Sources Thermionic Photo-Cathodes

8#@%-,1A.,1"#("$(B%%C%'D(%$$%',(1#( '"5+2%E(21FG1(%2%',&"26,%(C.-%(

Syllabus CSCI103: Electronic Textiles Tuesdays & Thursdays; 9:55-11:10am; TCL217

Delaware Electric Cooperative Efficiency, Conservation, Demand Side Management Efficiency &