Preferential Bayesian Optimization Javier Gonz´ alez, Zhenwen Dai , Andreas Damianou, Neil D. Lawrence @ICML 2017, Sydney, Australia June 26, 2019
My Colleagues Javier Gonz´ alez Andreas Damianou Neil D. Lawrence
Motivation ◮ Bayesian Optimization aims at searching for the global minimum of an expensive function g , x min = arg min x ∈X g ( x ) . ◮ What if the function g is not directly measurable?
Preference vs. Rating ◮ The objective function of many tasks are difficult to precisely summarize into a single value. ◮ Comparison is almost always easier than rating for humans. ◮ Such observation has been exploited in A/B testing.
BO via Preference ◮ Beyond a single A/B testing. ◮ To optimize a system via tuning this configuration, e.g., the font size, background color of a website. ◮ The objective such as customer experience is not directly measurable ◮ Compare the objective with two different configurations. ◮ The task is to search for the best configuration by iteratively suggesting pairs of configurations and observing the results of comparisons.
Problem Definition ◮ To find the minimum of a latent function g ( x ) , x ∈ X . ◮ Observe only whether g ( x ) < g ( x ′ ) or not, for a duel [ x , x ′ ] ∈ X × X . ◮ The outcomes are binary: true or false . ◮ The outcomes are stochastic .
Preference Function Objective function 20 ◮ In this work, the probabilistic 15 Global minimum 10 distribution is assumed to Bernoulli: f(x) 5 0 − 5 − 10 Copeland and soft-Copeland functions p ( y ∈ { 0 , 1 }| [ x , x ′ ]) = π y (1 − π ) 1 − y , Preference function 1 . 0 0.5 � � g ( x ′ ) − g ( x ) π = σ . 0 . 9 0 . 8 0.5 0 . 8 0 . 7 ◮ π is referred to as a preference 0 . 6 0 . 6 function . 0 . 5 x’ 0 . 4 0 . 4 ◮ A Preferential Bayesian optimization 0 . 3 0 . 2 algorithm will propose a sequence of 0 . 2 0 . 1 0.5 duels that helps efficiently localize the minimum of a latent function g ( x ). 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 x
A Surrogate Model Preference function 1 . 0 0.5 0 . 9 0 . 8 0.5 0 . 8 0 . 7 0 . 6 0 . 6 0 . 5 x’ 0 . 4 0 . 4 ◮ The preference function is not observable. 0 . 3 0 . 2 ◮ Only observe a few comparisons. 0 . 2 0.5 0 . 1 ◮ Need a surrogate model to guide the search. 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 x ◮ Two choices: Expectation of y ⋆ and σ ( f ⋆ ) ◮ a surrogate model for the latent function (like in 1 . 0 0 . 8 standard BO). [Brochu, 2010, Guo et al., 2010] 0 . 8 0 . 7 ◮ a surrogate model for the preference function 0 . 6 0 . 6 0 . 5 0 . 4 0 . 4 0 . 3 0 . 2 0 . 2 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0
A Surrogate Model of Preference Model Preference function 1 . 0 0.5 0 . 9 0 . 8 0.5 0 . 8 ◮ We propose to build a surrogate model for the 0 . 7 0 . 6 0 . 6 preference function. 0 . 5 x’ 0 . 4 ◮ Pros: easy to model (Gaussian process Binary 0 . 4 0 . 3 0 . 2 Classification is used:) 0 . 2 0.5 0 . 1 � 0 . 0 p ( y ⋆ = 1 |D , [ x , x ′ ] , θ ) = σ ( f ⋆ ) p ( f ⋆ |D , [ x ⋆ , x ′ 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 ⋆ ] , θ ) df ⋆ x Expectation of y ⋆ and σ ( f ⋆ ) 1 . 0 ◮ Pros: flexible latent function (e.g., 0 . 8 0 . 8 0 . 7 non-stationality). 0 . 6 0 . 6 ◮ Cons: the minimum of the latent function is not 0 . 5 directly accessible 0 . 4 0 . 4 0 . 3 0 . 2 0 . 2 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0
Who is the winner (the minimum)? ◮ The minimum beats all the other locations on average. ◮ Extending an idea from armed-bandits [Zoghi et al., 2015], we define the soft-Copeland score as, (the average winning probability), � C ( x ) = Vol( X ) − 1 π f ([ x , x ′ ]) d x ′ , X ◮ The optimum of g ( x ) can be estimated as, denoted as the Condorcet winner, x c = arg max x ∈X C ( x ) , Objective function Preference function 20 1 . 0 0 . 5 15 Global minimum 10 0 . 9 0 . 8 f(x) 5 0 . 5 0 . 8 0 0 . 7 − 5 0 . 6 0 . 6 − 10 0 . 5 Copeland and soft-Copeland functions x’ 1 . 4 0 . 4 0 . 4 1 . 2 Copeland Score value 0 . 3 1 . 0 soft-Copeland 0 . 8 0 . 2 0 . 6 0 . 2 0 0 . 1 . 5 0 . 4 0 . 2 0 . 0 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 x x
The current estimation of minimum ◮ Only have a surrogate model of preference function. ◮ Estimate the soft-Copeland score from the surrogate model and get an approximate Condorcet winner. ◮ Note that the approximated Condorcet winner may not be the optimum of g ( x ).
Acquisition Function ◮ Existing Acq. Func. are not applicable . ◮ They are designed to work with a surrogate model of the objective function. Expectation of y ⋆ and σ ( f ⋆ ) 1 . 0 ◮ In PBO, the surrogate model does not directly 0 . 8 0 . 8 represent the latent objective function. 0 . 7 0 . 6 ◮ We need a new Acq. Func. for duels! 0 . 6 0 . 5 0 . 4 0 . 4 0 . 3 0 . 2 0 . 2 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0
Pure Exploration Acquisition Function (PBO-PE) Variance of y ∗ 1 . 0 0 . 24 0 . 8 0 . 22 0 . 20 0 . 6 0 . 18 ◮ The common pure explorative acq. func., i.e. V [ y ], 0 . 4 0 . 16 does not work. 0 . 14 0 . 2 ◮ Propose a pure explorative acq. func. as the 0 . 12 0 . 0 0 . 10 variance (uncertainty) of the “winning” probability 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Variance of σ ( f ⋆ ) of a duel: 1 . 0 0 . 09 � 0 . 8 0 . 08 ( σ ( f ⋆ ) − E [ σ ( f ⋆ )]) 2 p ( f ⋆ |D , [ x , x ′ ]) df ⋆ V [ σ ( f ⋆ )] = 0 . 07 0 . 6 0 . 06 0 . 4 0 . 05 0 . 04 0 . 2 0 . 03 0 . 02 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0
Acquisition Function: PBO-DTS To select the next duel [ x next , x ′ next ]: 1. Draw a sample from surrogate model 2. Take the maximum of soft-Copeland score as x next . 3. Take x ′ next that gives the maximum in PBO-PE Sample of σ ( f ⋆ ) Sampled Copeland Function Variance of σ ( f ⋆ ) 1 . 0 0 . 8 1 . 0 0 . 09 0 . 7 0 . 8 0 . 8 0 . 8 0 . 08 0 . 6 0 . 07 0 . 6 0 . 6 0 . 5 0 . 6 0 . 06 0 . 4 0 . 4 0 . 4 0 . 4 0 . 05 0 . 3 0 . 04 0 . 2 0 . 2 0 . 2 0 . 2 0 . 03 0 . 1 0 . 02 0 . 0 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0
Experiment: Forrester Function ◮ Synthetic 1D function: Forrester Forrester PBO-PE ◮ Observations drawn with 2 PBO-DTS 1 a probability: PBO-CEI 1+ e g ( x ) − g ( x ′ ) RANDOM ◮ g ( x c ) shows the value at 0 IBO g ( x c ) the location that SPARRING − 2 algorithms believe is the minimum. − 4 ◮ The curve is the average − 6 of 20 trials. 0 25 50 75 100 125 150 175 200 #iterations IBO: [Brochu, 2010] SPARRING: [Ailon et al., 2014]
Experiments: More (2D) Functions Forrester Six Hump Camel PBO-PE 2 . 0 PBO-PE 2 PBO-DTS PBO-DTS 1 . 5 PBO-CEI RANDOM RANDOM IBO 0 IBO 1 . 0 g ( x c ) g ( x c ) SPARRING − 2 0 . 5 0 . 0 − 4 − 0 . 5 − 6 0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200 #iterations #iterations Gold Stein Levy PBO-PE PBO-PE PBO-DTS PBO-DTS RANDOM 10 1 RANDOM IBO IBO 10 4 g ( x c ) g ( x c ) 10 3 10 0 0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200 #iterations #iterations
Summary ◮ Address Bayesian optimization with preferential returns. ◮ Propose to build a surrogate model for the preference function. ◮ Propose a few efficient acquisition functions. ◮ Show the performance on synthetic functions.
Questions?
Exploration & Exploitation The two ingredients in an acquisition function: Exploration & Exploitation.
Exploration in PBO ◮ To understand exploration in PBO by designing a pure explorative acq. func. ◮ Exploration in standard BO can be viewed as the action to reduce uncertainty of a surrogate model. ◮ A purely explorative acq. func. � ( y ⋆ − E [ y ⋆ ]) 2 p ( y ⋆ |D , x ⋆ )d y ⋆ V [ y ⋆ ] = ◮ Can we extend this idea to PBO?
A Straight-Forward Choice Expectation of y ⋆ and σ ( f ⋆ ) 1 . 0 0 . 8 0 . 8 0 . 7 0 . 6 0 . 6 ◮ A straight-forward extension from standard BO: 0 . 5 0 . 4 0 . 4 ( y ⋆ − E [ y ⋆ ]) 2 p ( y ⋆ |D , [ x ⋆ , x ′ � V [ y ⋆ ] = ⋆ ]) 0 . 3 0 . 2 y ⋆ ∈{ 0 , 1 } 0 . 2 0 . 0 = E [ y ⋆ ](1 − E [ y ⋆ ]) 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Variance of y ∗ 1 . 0 0 . 24 ◮ The maximum variance is always at where 0 . 8 0 . 22 E [ y ⋆ ] = 0 . 5! 0 . 20 0 . 6 ◮ The variance may not reduce with observations! 0 . 18 0 . 16 0 . 4 0 . 14 0 . 2 0 . 12 0 . 10 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0
Dueling-Thompson Sampling (DTS) 100 Copeland Samples (#duels = 10) 1 . 0 0 . 8 0 . 6 0 . 4 ◮ To balance exploration & exploitation, we borrow 0 . 2 the idea of Thompson sampling by drawing a 0 . 0 100 Copeland Samples (#duels = 30) sample from the surrogate model. 1 . 0 0 . 8 ◮ Compute the soft-copeland score on the drawn 0 . 6 sample. 0 . 4 ◮ The value x next that gives the maximum 0 . 2 soft-copeland score gives a good balance between 0 . 0 exploration and exploitation. 100 Copeland Samples (#duels = 150) 1 . 0 ◮ Take it as the first value of the next duel. 0 . 8 0 . 6 0 . 4 0 . 2 0 . 0 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8
Recommend
More recommend