Tuning numerical parameters of algorithms: sampling and stochasticity handling Z. Yuan, T. St¨ utzle, M. Birattari, M. Montes de Oca IRIDIA, CoDE, Universit´ e Libre de Bruxelles Brussels, Belgium zyuan@ulb.ac.be iridia.ulb.ac.be/~zyuan
Outline 1. The tuning problem 2. Tuning algorithm Sampling in parameter space Budget allocation for ranking and selection: F-Race Combine F-Race with Sampling method Iterated F-Race (Birattari et al. 2010) Post-selection mechanism 3. Experimental results
Outline 1. The tuning problem 2. Tuning algorithm Sampling in parameter space Budget allocation for ranking and selection: F-Race Combine F-Race with Sampling method Iterated F-Race (Birattari et al. 2010) Post-selection mechanism 3. Experimental results
Configuration of parameterized algorithms Algorithm components ◮ categorical parameters ◮ choice of neighborhood in local search ◮ choice of crossover and mutation in EAs ◮ type of perturbation in iterated local search ◮ numerical parameters (real-valued or integer) ◮ crossover and mutation rates ◮ tabu list length ◮ perturbation strength
Importance of the tuning problem ◮ improvement over default settings, manual tuning ◮ reduction of development time and human intervention ◮ empirical studies, comparisons of algorithms ◮ support for end users of algorithms
Tuning problem: formal definition (Birattari et al. 2002) The tuning problem can be defined as a tuple � Θ , I , P I , P C , C�
Tuning problem: formal definition (Birattari et al. 2002) The tuning problem can be defined as a tuple � Θ , I , P I , P C , C� ◮ Θ: set of candidate configurations.
Tuning problem: formal definition (Birattari et al. 2002) The tuning problem can be defined as a tuple � Θ , I , P I , P C , C� ◮ Θ: set of candidate configurations. ◮ I : set of instances. P I : probability measure over I .
Tuning problem: formal definition (Birattari et al. 2002) The tuning problem can be defined as a tuple � Θ , I , P I , P C , C� ◮ Θ: set of candidate configurations. ◮ I : set of instances. P I : probability measure over I . ◮ c ( θ, i ): random variable representing the cost measure of a configuration θ ∈ Θ on instance i ∈ I
Tuning problem: formal definition (Birattari et al. 2002) The tuning problem can be defined as a tuple � Θ , I , P I , P C , C� ◮ Θ: set of candidate configurations. ◮ I : set of instances. P I : probability measure over I . ◮ c ( θ, i ): random variable representing the cost measure of a configuration θ ∈ Θ on instance i ∈ I ◮ C ⊂ ℜ : range of c . P C : probability measure over the set C .
Tuning problem: formal definition (Birattari et al. 2002) The tuning problem can be defined as a tuple � Θ , I , P I , P C , C� ◮ Θ: set of candidate configurations. ◮ I : set of instances. P I : probability measure over I . ◮ c ( θ, i ): random variable representing the cost measure of a configuration θ ∈ Θ on instance i ∈ I ◮ C ⊂ ℜ : range of c . P C : probability measure over the set C . ◮ C ( θ ) = C ( θ | Θ , I , P I , P C ): performance expectation: � C ( θ ) = E I , C [ c ] = c dP C ( c | θ, i ) dP I ( i ) , (1)
Tuning problem: formal definition (Birattari et al. 2002) The tuning problem can be defined as a tuple � Θ , I , P I , P C , C� ◮ Θ: set of candidate configurations. ◮ I : set of instances. P I : probability measure over I . ◮ c ( θ, i ): random variable representing the cost measure of a configuration θ ∈ Θ on instance i ∈ I ◮ C ⊂ ℜ : range of c . P C : probability measure over the set C . ◮ C ( θ ) = C ( θ | Θ , I , P I , P C ): performance expectation: � C ( θ ) = E I , C [ c ] = c dP C ( c | θ, i ) dP I ( i ) , (1) ◮ The objective is to find a performance optimizing configuration ¯ θ : ¯ θ = arg min θ ∈ Θ C ( θ ) (2)
Tuning problem: formal definition (Birattari et al. 2002) The tuning problem can be defined as a tuple � Θ , I , P I , P C , C� ◮ Θ: set of candidate configurations. ◮ I : set of instances. P I : probability measure over I . ◮ c ( θ, i ): random variable representing the cost measure of a configuration θ ∈ Θ on instance i ∈ I ◮ C ⊂ ℜ : range of c . P C : probability measure over the set C . ◮ C ( θ ) = C ( θ | Θ , I , P I , P C ): performance expectation: � C ( θ ) = E I , C [ c ] = c dP C ( c | θ, i ) dP I ( i ) , (1) ◮ The objective is to find a performance optimizing configuration ¯ θ : ¯ θ = arg min θ ∈ Θ C ( θ ) (2) ◮ Analytical solution not possible, hence estimate expected cost in a Monte Carlo fashion
Tuning problem is an optimization problem Variables: mixed discrete-continuous, conditional variables Objective: ◮ black-box ◮ stochastic ◮ due to stochasticity of the algorithm ◮ due to sampling of instances
Outline 1. The tuning problem 2. Tuning algorithm Sampling in parameter space Budget allocation for ranking and selection: F-Race Combine F-Race with Sampling method Iterated F-Race (Birattari et al. 2010) Post-selection mechanism 3. Experimental results
Solving tuning problem: Our approach ◮ sampling in parameter space ◮ budget allocation for ranking and selection under stochasticity: F-Race ◮ combine budget allocator with sampling methods Open question: trade-off in allocating budget to sampling new points or evaluation of sampled points.
Sampling in parameter space ◮ focus on numerical parameters ◮ usually low dimension, low budget ◮ sampling in established tuners: ad-hoc, factorial design, Kriging approximation ◮ our work studies state-of-the-art derivative-free optimizers: BOBYQA, CMA-ES, and MADS (Yuan et al. 2010, 2012a) Average rank of algorithms across numbers of parameters in MMAS Average rank of algorithms across numbers of parameters in cPSO 5 5 4 4 Average rank Average rank CMAES CMAES 3 3 MADS MADS IRS IRS URS URS BOBYQA BOBYQA 2 2 1 1 2 3 4 5 6 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Number of parameters Number of parameters
F-Race (Birattari et al. 2002) Θ i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
F-Race (Birattari et al. 2002) Θ ◮ start with a set of initial candidates ◮ consider a stream of instances ◮ sequentially evaluate candidates ◮ discard statistically worse candidates as detected by Friedman test i
Recommend
More recommend