bayesian optimization of composite functions
play

Bayesian Optimization of Composite Functions Ral Astudillo Cornell - PowerPoint PPT Presentation

Bayesian Optimization of Composite Functions Ral Astudillo Cornell University Joint work with Peter I. Frazier ICML 2019 Ral Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions 1 0 1 log10(regret) 2 3


  1. Bayesian Optimization of Composite Functions Raúl Astudillo Cornell University Joint work with Peter I. Frazier ICML 2019 Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  2. 1 0 −1 log10(regret) −2 −3 −4 −5 Random EI −6 PES Our method −7 0 20 40 60 80 100 function evaluations Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  3. Problem We consider problems of the form max x ∈X f ( x ) , where f ( x ) = g ( h ( x )) and • h : X ⊂ R d → R m is a time-consuming-to-evaluate black-box, • g : R m → R and its gradient are known in closed form and fast-to-evaluate. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  4. Composite functions arise naturally in practice • Hyperparameter tuning of classification algorithms: m g ( h ( x )) = − � h j ( x ) , j =1 where h j ( x ) is the classification error on the j -th class under hyperparameters x . • Calibration of expensive simulators: m ( h j ( x ) − y j ) 2 , g ( h ( x )) = − � j =1 where h ( x ) is the output of the simulator under parameters x and y is a vector of observed data. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  5. Standard BayesOpt approach • Set a Gaussian process distribution on f . • While evaluation budget is not exhausted: ◦ Compute the posterior distribution on f given the evaluations so far, { ( x i , f ( x i )) } n i =1 , ◦ Choose the next point to evaluate as the one that maximizes an acquisition function a : x n +1 ∈ argmax x a n ( x ) , where the subscript n indicates the dependence on the posterior distribution at time n . Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  6. Background: Expected Improvement (EI) The most widely used acquisition function in standard BayesOpt is: { f ( x ) − f ∗ � n } + � EI n ( x ) = E n , where • f ∗ n is the best observed value so far, • E n is the conditional expectation under the posterior after n evaluations. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  7. Background: Expected Improvement (EI) The most widely used acquisition function in standard BayesOpt is: { f ( x ) − f ∗ � n } + � EI n ( x ) = E n . When f ( x ) is Gaussian, EI and its derivative have a closed form which make it easy to optimize. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  8. Our contribution 1. A statistical approach for modeling f that greatly improves over the standard BayesOpt approach. 2. An efficient way to optimize the expected improvement under this new statistical model. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  9. Our approach • Model h using a multi-output Gaussian process instead of f directly. • This implies a (non-Gaussian) posterior on f ( x ) = g ( h ( x )) . • To decide where to sample next: compute and optimize the expected improvement under this new posterior. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  10. Expected Improvement for Composite Functions Our acquisition function is Expected Improvement for Composite Functions (EI-CF): { g ( h ( x )) − f ∗ � n } + � EI-CF n ( x ) = E n , where h is a GP, making h ( x ) Gaussian. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  11. 8 h posterior mean 6 95% confidence interval 4 h(x) 2 0 −2 −4 −4 −2 0 2 4 x 35 35 f f posterior mean 30 posterior mean 30 95% confidence interval 95% confidence interval 25 25 20 20 f(x) f(x) 15 15 10 10 5 5 0 0 −4 −2 0 2 4 −4 −2 0 2 4 x x EI-CF EI 0.6 3.0 0.5 2.5 0.4 2.0 EI-CF(x) EI(x) 0.3 1.5 1.0 0.2 0.5 0.1 0.0 0.0 −4 −2 0 2 4 −4 −2 0 2 4 x x Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  12. Challenge: maximizing EI-CF is hard Expected Improvement for Composite Functions (EI-CF): { g ( h ( x )) − f ∗ � n } + � EI-CF n ( x ) = E n . Challenge: • When h is a GP and g is nonlinear, f ( x ) = g ( h ( x )) is not Gaussian . • EI-CF does not have a closed form, making it hard to optimize. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  13. Our approach to maximize EI-CF • Construct an unbiased estimator of ∇ EI-CF n ( x ) using the reparametrization trick and infinitesimal perturbation analysis. • Use this estimator within multi-start stochastic gradient ascent to find an approximate solution of argmax x EI-CF n ( x ) . Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  14. Asymptotic consistency Theorem. Under suitable regularity conditions, EI-CF is asymptotically consistent, i.e., it finds the true global optimum as the number of evaluations goes to infinity. Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  15. Numerical experiments Random Random 0 1 PI PI EI EI 0 PES PES −1 Random-CF Random-CF −1 PI-CF PI-CF log10(regret) log10(regret) EI-CF EI-CF −2 −2 −3 −3 −4 −5 −4 −6 −7 −5 0 20 40 60 80 100 0 20 40 60 80 100 function evaluations function evaluations 1 2 Random Random PI PI 0 1 EI EI PES PES −1 0 Random-CF Random-CF −2 PI-CF −1 PI-CF log10(regret) log10(regret) EI-CF EI-CF −2 −3 −3 −4 −4 −5 −5 −6 −6 −7 0 20 40 60 80 100 0 20 40 60 80 100 function evaluations function evaluations Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

  16. Conclusion • Exploiting composite objectives can improve BayesOpt performance by 3-6 orders of magnitude. • Come to our poster: Wed 6:30-9pm Pacific Ballroom #237. • Check out our code: https://github.com/RaulAstudillo06/BOCF Raúl Astudillo ra598@cornell.edu Bayesian Optimization of Composite Functions

Recommend


More recommend