Knowing The What, But Not The Where in Bayesian Optimization Vu Nguyen & Michael A. Osborne University of Oxford Vu Nguyen Bayesian Optimization 1
Black-box Optimization The relationship from � to � is through the black-box. Output � = �(�) Input � Black-box �(�) looking for this maximizer �(� � ) � � �(� � ) Output � � �(�) �(� � ) � � � ��� � � � � � � Input Bayesian Optimization 2 Vu Nguyen
Properties of Black-box Function �: � ∈ ℛ � → � ∈ ℛ � � = �(�) � �(�) input output � = �� + � Function form is not known �� �� = ⋯ No derivative form Expensive to evaluate (in time and cost) Nothing is known about the function, except a few evaluations � = �(�) Bayesian Optimization 3 Vu Nguyen
Bayesian Optimization Overview output � Refine Make a series of evaluations � � , � � , … � � Bayes Opt �(�) input � exploit explore Acquisition function �(�) = �(�) + � × �(�) �(�) �(�) predictive mean predictive variance Surrogate function Bayesian Optimization 4 Vu Nguyen
Outline Bayesian Optimization Bayes Opt with Known Optimum Value Knowing the what, but not the where in Bayes Opt 5 Vu Nguyen
Knowing Optimum Value of The Black-Box We consider situations where the optimum value is known. � ∗ = max �(�) and the goal is to find � ∗ = arg max �(�) . Knowing the what, but not the where in Bayes Opt 6 Vu Nguyen
Examples of Knowing Optimal Value of The Black-Box Deep reinforcement learning: CartPole: 200 Pong: 18 Frozen Lake: 0.79 ± 0.05 InvertedPendulum: 950 Classification: Skin dataset: Accuracy 100 Inverse optimization: Given a database and a target property � , identifying a corresponding data point � ∗ . Knowing the what, but not the where in Bayes Opt 7 Vu Nguyen
What can � ∗ tell us about � ? � ∗ tells us about the upper bound: � ∗ ≥ � � , ∀� 1 1. � ∗ tells us that the function is reaching � ∗ at some points. 2 2. Knowing the what, but not the where in Bayes Opt 8 Vu Nguyen
Transformed Gaussian process � � = � ∗ − 1 2 � � (�) � � ∼ ��( 2� ∗ , �) ≥ 0 This condition ensures that � ∗ ≥ � � , ∀� 1 Knowing the what, but not the where in Bayes Opt 9 Vu Nguyen
We want to control the surrogate using � ∗ Push down: the surrogate must not go above � ∗ 1 standard GP �(�) is above � ∗ transformed GP below � ∗ Knowing the what, but not the where in Bayes Opt 10 Vu Nguyen
Transformed Gaussian process � � = � ∗ − � � � � (�) � � ∼ ��(0, �) Zero mean prior ! ≥ 0 This condition encourages that there is a point where � � = 0 and thus � ∗ = � � 2 Knowing the what, but not the where in Bayes Opt 11 Vu Nguyen
We want to control the surrogate using � ∗ Lift up: the surrogate should reach � ∗ 2 standard GP �(� ) does not reach � ∗ transformed GP reach � ∗ Knowing the what, but not the where in Bayes Opt 12 Vu Nguyen
Transformed Gaussian process Linearization using Taylor expansion � � ≈ � ∗ − 1 � � − � � � 2 � � � � − � � � = � ∗ + 1 � � − � � � � � 2 � � Linear transformation of a GP remains Gaussian � � = � ∗ − 1 � (�) 2 � � � � = � � � � � � � � (�) The predictive distribution � � ∼ �(� � , �(�)) � (�) Taylor expansion is very accurate at the mode which is � � Knowing the what, but not the where in Bayes Opt 13 Vu Nguyen
Outline Bayesian Optimization Bayes Opt with Known Optimum Value � ∗ Problem definition Exploiting � ∗ Building better surrogate model Making informed decision Knowing the what, but not the where in Bayes Opt 14 Vu Nguyen
Confidence Bound Minimization Under GP surrogate model, we have this condition w.h.p. Upper bound Lower bound where � � is defined following [Srinivas et al 2010]. This means � � ∗ − � � � � ∗ ≤ � � ∗ = � ∗ ≤ � � ∗ + � � � � ∗ Lower bound unknown known Upper bound can be estimated ∀� Knowing the what, but not the where in Bayes Opt 15 Vu Nguyen
Confidence Bound Minimization The best candidate for � ∗ is where the bound is tight � � = arg min � � − � ∗ + � � � � Upper bound Lower bound The inequality becomes equality at the true � ∗ location where � � ∗ − � � � � ∗ = � ∗ = � � ∗ + � � � � ∗ Lower bound Upper bound known when � � ∗ = � ∗ and � � ∗ = 0 Knowing the what, but not the where in Bayes Opt 16 Vu Nguyen
Expected Regret Minimization Regret � = � ∗ − �(� � ) where � ∗ = max � � , ∀� Finding the optimum location � ∗ = minimizing the regret. We can select the next point by minimizing the expected regret. Knowing the what, but not the where in Bayes Opt 17 Vu Nguyen
Expected Regret Minimization Using analytical derivation, we derive the closed-form computation for ERM. � ����� ∗ � = � � × � � + � ∗ − � � × Φ � � ∗ �� � � = Gaussian PDF Gaussian CDF � � GP variance GP mean See the paper for details! Knowing the what, but not the where in Bayes Opt 18 Vu Nguyen
Illustration Tend to explore Existing Baselines elsewhere Correctly identify the true The Proposed (unknown) location Knowing the what, but not the where in Bayes Opt 19 Vu Nguyen
The GP transformation is helpful in high dimension Knowing the what, but not the where in Bayes Opt 20 Vu Nguyen
XGBoost Classification and DRL Skin dataset UCI � ∗ = 100 CartPole DRL � ∗ = 200 Knowing the what, but not the where in Bayes Opt 21 Vu Nguyen
Mis-specified � ∗ will degrade the performance Under-specified � ∗ smaller than the true � ∗ More serious, as the algorithm will get stuck. Over-specified � ∗ greater than the true � ∗ Less serious, but still poor performance. Knowing the what, but not the where in Bayes Opt 22 Vu Nguyen
Take Home Messages Bayes opt is efficient for optimizing the black-box function When the optimum value is known, we can exploit this knowledge for better optimization. Knowing the what, but not the where in Bayes Opt 23 Vu Nguyen
Question and Answer vu@robots.ox.ac.uk @nguyentienvu https://ntienvu.github.io Conclusion 24 Vu Nguyen
Recommend
More recommend