Bayesian optimisation Gilles Louppe April 11, 2016 Problem - PowerPoint PPT Presentation

Bayesian optimisation Gilles Louppe April 11, 2016

Problem statement x ∗ = arg max f ( x ) x Constraints: • f is a black box for which no closed form is known; gradients df dx are not available. • f is expensive to evaluate; • (optional) uncertainty on observations y i of f e.g., y i = f ( x i ) + ǫ i because of Poisson fluctuations. Goal: find x ∗ , while minimizing the number of evaluations f ( x ). 2 / 18

Disclaimer If you do not have these constraints, there is certainly a better optimisation algorithm than Bayesian optimisation. (e.g., L-BFGS-B, Powell’s method (as in Minuit), etc) 3 / 18

Bayesian optimisation for t = 1 : T , 1. Given observations ( x i , y i ) for i = 1 : t , build a probabilistic model for the objective f . Integrate out all possible true functions, using Gaussian process regression. 2. Optimise a cheap utility function u based on the posterior distribution for sampling the next point. x t +1 = arg max u ( x ) x Exploit uncertainty to balance exploration against exploitation. 3. Sample the next observation y t +1 at x t +1 . 4 / 18

Where shall we sample next? 1.5 True (unknown) Observations 1.0 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x 5 / 18

Build a probabilistic model for the objective function 1.5 True (unknown) Observations µ GP ( x ) 1.0 CI 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x This gives a posterior distribution over functions that could have generated the observed data. 6 / 18

Acquisition functions Acquisition functions u( x ) specify which sample x should be tried next: • Upper confidence bound UCB( x ) = µ GP ( x ) + κσ GP ( x ); • Probability of improvement PI( x ) = P ( f ( x ) ≥ f ( x + t ) + κ ); • Expected improvement EI( x ) = E [ f ( x ) − f ( x + t )]; • ... and many others. where x + t is the best point observed so far. In most cases, acquisition functions provide knobs (e.g., κ ) for controlling the exploration-exploitation trade-off. • Search in regions where µ GP ( x ) is high (exploitation) • Probe regions where uncertainty σ GP ( x ) is high (exploration) 7 / 18

Plugging everything together ( t = 0) x + = 0 . 1000 t 1.5 True (unknown) Observations µ GP ( x ) 1.0 u(x) CI 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x x t +1 = arg max x UCB( x ) 8 / 18

... and repeat until convergence ( t = 1) x + = 0 . 1000 t 1.5 True (unknown) Observations µ GP ( x ) 1.0 u(x) CI 0.5 f(x) 0.0 0.5 1.0 1.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 x 9 / 18

What is Bayesian about Bayesian optimization? • The Bayesian strategy treats the unknown objective function as a random function and place a prior over it. The prior captures our beliefs about the behaviour of the function. It is here defined by a Gaussian process whose covariance function captures assumptions about the smoothness of the objective. • Function evaluations are treated as data. They are used to update the prior to form the posterior distribution over the objective function. • The posterior distribution, in turn, is used to construct an acquisition function for querying the next point. 14 / 18

Limitations • Bayesian optimisation has parameters itself! Choice of the acquisition function Choice of the kernel (i.e. design of the prior) Parameter wrapping Initialization scheme • Gaussian processes usually do not scale well to many observations and to high-dimensional data. Sequential model-based optimization provides a direct and effective alternative (i.e., replace GPs by a tree-based model). 15 / 18

Applications • Bayesian optimization has been used in many scientific fields, including robotics, machine learning or life sciences. • Use cases for high energy physics? Optimisation of simulation parameters in event generators; Optimisation of compiler flags to maximize execution speed; Optimisation of hyper-parameters in machine learning for HEP; ... let’s discuss further ideas? 16 / 18

Software • Python Spearmint https://github.com/JasperSnoek/spearmint GPyOpt https://github.com/SheffieldML/GPyOpt RoBO https://github.com/automl/RoBO scikit-optimize https://github.com/MechCoder/scikit-optimize (work in progress) • C++ MOE https://github.com/yelp/MOE Check also this Github repo for a vanilla implementation reproducing these slides. 17 / 18

Summary • Bayesian optimisation provides a principled approach for optimising an expensive function f ; • Often very effective, provided it is itself properly configured; • Hot topic in machine learning research. Expect quick improvements! 18 / 18

References Brochu, E., Cora, V. M., and De Freitas, N. (2010). A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 . Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. (2016). Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE , 104(1):148–175.

Bayesian optimisation Gilles Louppe April 11, 2016 Problem - PowerPoint PPT Presentation

Bayesian optimisation Gilles Louppe April 11, 2016 Problem statement x = arg max f ( x ) x Constraints: f is a black box for which no closed form is known; gradients df dx are not available. f is expensive to evaluate;

Medicines optimisation The road to excellence Workshop Overview of meds optimisation Your

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Automated and Accurate Geometry Extraction and Shape Optimisation of 3D Topology Optimisation

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang

Introduction to program optimisation Michel Schinz (based on Erik Stenmans slides) Advanced

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Parallelised Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy Akshay Jeff

High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kanda samy ,

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Solving the advection PDE on the Cell Broadband Engine Georgios Rokos, Gerassimos Peteinatos,

Optimization of Continuous Queries in Federated Database and Stream Processing Systems uanzhen Ji

Bargaining and Coalition Formation Dr James Tremewan (james.tremewan@univie.ac.at) Early

Interoperability standards in the localization industry Status today and opportunities for

Advanced Machine Learning CS 7140 - Spring 2019 Lecture 24: Bayesian Optimization Jan-Willem van

A Convolutional Attention Network for Extreme Summarization of Source Code ATTENTION

Hyperparameter Optimization with SHERPA Lars Hertel, Julian Collado, Peter Sadowski, Pierre Baldi

AI-Augmented Algorithms How I Learned to Stop Worrying and Love Choice Lars Kotthofg University