Bayesian Optimization of Composite Functions Ral Astudillo Cornell - PowerPoint PPT Presentation

Bayesian Optimization of Composite Functions Raúl Astudillo Cornell University Joint work with Peter I. Frazier ICML 2019 Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

1 0 −1 log10(regret) −2 −3 −4 −5 Random EI −6 PES Our method −7 0 20 40 60 80 100 function evaluations Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Problem We consider problems of the form max x ∈X f ( x ) , where f ( x ) = g ( h ( x )) and • h : X ⊂ R d → R m is a time-consuming-to-evaluate black-box, • g : R m → R and its gradient are known in closed form and fast-to-evaluate. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Composite functions arise naturally in practice • Hyperparameter tuning of classification algorithms: m g ( h ( x )) = − � h j ( x ) , j =1 where h j ( x ) is the classification error on the j -th class under hyperparameters x . • Calibration of expensive simulators: m ( h j ( x ) − y j ) 2 , g ( h ( x )) = − � j =1 where h ( x ) is the output of the simulator under parameters x and y is a vector of observed data. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Standard BayesOpt approach • Set a Gaussian process distribution on f . • While evaluation budget is not exhausted: ◦ Compute the posterior distribution on f given the evaluations so far, { ( x i , f ( x i )) } n i =1 , ◦ Choose the next point to evaluate as the one that maximizes an acquisition function a : x n +1 ∈ argmax x a n ( x ) , where the subscript n indicates the dependence on the posterior distribution at time n . Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Background: Expected Improvement (EI) The most widely used acquisition function in standard BayesOpt is: { f ( x ) − f ∗ � n } + � EI n ( x ) = E n , where • f ∗ n is the best observed value so far, • E n is the conditional expectation under the posterior after n evaluations. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Background: Expected Improvement (EI) The most widely used acquisition function in standard BayesOpt is: { f ( x ) − f ∗ � n } + � EI n ( x ) = E n . When f ( x ) is Gaussian, EI and its derivative have a closed form which make it easy to optimize. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Our contribution 1. A statistical approach for modeling f that greatly improves over the standard BayesOpt approach. 2. An efficient way to optimize the expected improvement under this new statistical model. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Our approach • Model h using a multi-output Gaussian process instead of f directly. • This implies a (non-Gaussian) posterior on f ( x ) = g ( h ( x )) . • To decide where to sample next: compute and optimize the expected improvement under this new posterior. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Expected Improvement for Composite Functions Our acquisition function is Expected Improvement for Composite Functions (EI-CF): { g ( h ( x )) − f ∗ � n } + � EI-CF n ( x ) = E n , where h is a GP, making h ( x ) Gaussian. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

8 h posterior mean 6 95% confidence interval 4 h(x) 2 0 −2 −4 −4 −2 0 2 4 x 35 35 f f posterior mean 30 posterior mean 30 95% confidence interval 95% confidence interval 25 25 20 20 f(x) f(x) 15 15 10 10 5 5 0 0 −4 −2 0 2 4 −4 −2 0 2 4 x x EI-CF EI 0.6 3.0 0.5 2.5 0.4 2.0 EI-CF(x) EI(x) 0.3 1.5 1.0 0.2 0.5 0.1 0.0 0.0 −4 −2 0 2 4 −4 −2 0 2 4 x x Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Challenge: maximizing EI-CF is hard Expected Improvement for Composite Functions (EI-CF): { g ( h ( x )) − f ∗ � n } + � EI-CF n ( x ) = E n . Challenge: • When h is a GP and g is nonlinear, f ( x ) = g ( h ( x )) is not Gaussian . • EI-CF does not have a closed form, making it hard to optimize. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Our approach to maximize EI-CF • Construct an unbiased estimator of ∇ EI-CF n ( x ) using the reparametrization trick and infinitesimal perturbation analysis. • Use this estimator within multi-start stochastic gradient ascent to find an approximate solution of argmax x EI-CF n ( x ) . Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Asymptotic consistency Theorem. Under suitable regularity conditions, EI-CF is asymptotically consistent, i.e., it finds the true global optimum as the number of evaluations goes to infinity. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Numerical experiments Random Random 0 1 PI PI EI EI 0 PES PES −1 Random-CF Random-CF −1 PI-CF PI-CF log10(regret) log10(regret) EI-CF EI-CF −2 −2 −3 −3 −4 −5 −4 −6 −7 −5 0 20 40 60 80 100 0 20 40 60 80 100 function evaluations function evaluations 1 2 Random Random PI PI 0 1 EI EI PES PES −1 0 Random-CF Random-CF −2 PI-CF −1 PI-CF log10(regret) log10(regret) EI-CF EI-CF −2 −3 −3 −4 −4 −5 −5 −6 −6 −7 0 20 40 60 80 100 0 20 40 60 80 100 function evaluations function evaluations Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Conclusion • Exploiting composite objectives can improve BayesOpt performance by 3-6 orders of magnitude. • Come to our poster: Wed 6:30-9pm Pacific Ballroom #237. • Check out our code: https://github.com/RaulAstudillo06/BOCF Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Bayesian Optimization of Composite Functions Ral Astudillo Cornell - PowerPoint PPT Presentation

Bayesian Optimization of Composite Functions Ral Astudillo Cornell University Joint work with Peter I. Frazier ICML 2019 Ral Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions 1 0 1 log10(regret) 2 3

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER COMPOSITE OF PLAGE AREAS OVER

Solving composite optimization problems, with applications to phase retrieval John Duchi (based

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

The Chain Rule Given a composite function: The Chain Rule Given a composite function: h ( x ) =

Plan Composite Likelihood Methods What are composite likelihoods? David Firth Where are

Composite Trust Composite Trust Composite Trust A formal derivation of conjunction A formal

BEEM103 Optimization Techniques for Economists Level Curves Multivariate Functions Isoquants

CSC321 Lecture 21: Bayesian Hyperparameter Optimization Roger Grosse Roger Grosse CSC321

More on Functions Thomas Schwarz, SJ Marquette University Functions of Functions Functions

Elementary Functions Part 1, Functions Lecture 1.4a, Symmetries of Functions: Even and Odd

Elementary Functions Part 1, Functions Lecture 1.1b, Functions defined by equations Dr. Ken W.

X10: a High-Productivity Approach to X10: a High-Productivity Approach to High Performance

TABLA: A Framework for Accelerating Statistical Machine Learning Presenters: MeiXing Dong,

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Advanced use of OpenSHMEM 2 Outline

How to Evaluate Efficient Deep Neural Network Approaches Vivienne Sze ( @eems_mit)

Primary 3 English Language Content Joy of Learning Unit Coverage Level Focuses

Models using Buses Chapter 10 Introduction Mesh Advantages Constant link length.

CS184c: Computer Architecture [Parallel and Multithreaded] Day 11: May10, 2001 Data Parallel

Do Public Employment Services Improve Employment Outcomes? Evidence from Colombia Clemente