1 Benchmarking the PSA-CMA-ES on the BBOB Noiseless Testbed Kouhei Nishida, Youhei Akimoto Shinshu University, University of Tsukuba
2 CMA-ES • It maintains a multivariate normal distribution 𝒪 ( m , Σ ) Σ = σ 2 C Step1 Sample Step1 Sample Step1 Sample Step1 Sample Step1 Sample Population Size : mean vector m Step2 Rank Step2 Rank Step2 Rank Step2 Rank Step2 Rank : step-size σ 2 3 4 1 : covariance matrix C Step3 Estimate Step3 Estimate Step3 Estimate Step3 Estimate Step3 Estimate 5 6 Step4 Update Step4 Update Step4 Update Step4 Update Step4 Update • All of its hyper-parameters have their default values i.e. the learning rate, the population size • The population size needs tuning if the objective function is a noisy or multimodal function [Hansen 2004]
3 CMA-ES: Population Size Tuning CMA-ES Approach to Avoid Tuning by Users • To utilize a multi-run strategy with different population sizes • To adapt the population size BIPOP-CMA-ES First run: CMA-ES with the default population size → unimodal functions Additional runs: • CMA-ES with an increased population size → well-structured multimodal or noisy functions • CMA-ES with a relatively small step-size and population size → weakly-structured multimodal functions
4 CMA-ES CMA-ES: Population Size Tuning Approach to Avoid Tuning by Users • To utilize a multi-run strategy with different population sizes • To adapt the population size PSA PSA-CMA-ES [Nishida2018, Thursday 19 , ENUM4 ] • Based on tendency of the parameter update Key Observation On multimodal functions and noisy functions, the parameter update has less tendency than on noiseless unimodal functions.
5 Population Size Adaptation P S A • Based on tendency of the parameter update Key Observation On multimodal functions and noisy functions, the parameter update has less tendency than on noiseless unimodal functions. In the parameter space of the sampling distribution… time: t → t + 1 𝒪 ( m ( t +1) , Σ ( t +1) ) Update step Δ θ θ = [ m , Σ ] 𝒪 ( m ( t ) , Σ ( t ) )
6 Population Size Adaptation P S A • Based on tendency of the parameter update Key Observation On multimodal functions and noisy functions, the parameter update has less tendency than on noiseless unimodal functions. In the parameter space of the sampling distribution… On • noiseless unimodal function
7 Population Size Adaptation P S A • Based on tendency of the parameter update Key Observation On multimodal functions and noisy functions, the parameter update has less tendency than on noiseless unimodal functions. In the parameter space of the sampling distribution… On • multimodal functions • noisy functions
8 PSA: Evolution Path P A S : • It accumulates steps in the parameter space 1 θ ( t ) Δ θ ( t +1) ℐ 2 ← ( 1 − β ) p ( t ) β ( 2 − β ) p ( t +1) θ + θ 1 𝔽 [ ∥ℐ θ ( t ) Δ θ ( t +1) ∥ 2 ] : cumulation factor 2 β : Fisher information matrix under ℐ θ θ : expectation under a random function 𝔽 [ ⋅ ] f ( x ) = ϵ normalization factor → To absorb the effect of… • Parameterization of the sampling distribution • Change of the population size under a random function when is too large λ ∥ p θ ∥ 2 ≈ 1 ∥ p θ ∥ 2 ≫ 1 λ : population size
9 PSA: Population Size Update P A S : λ ( t +1) ← λ ( t ) exp β ( γ ( t +1) − ∥ p ( t +1) ) ∥ 2 θ α α : threshold γ ( t ) : normalization factor ≈ 1 ( t ≫ 1) γ ( t +1) ← (1 − β ) 2 γ ( t ) + β (2 − β ) ∥ p θ ∥ 2 < α ⇒ The population size increases ∥ p θ ∥ 2 > α ⇒ The population size decreases → the population size is adapted so that the parameter update has sufficient tendency
10 PSA: Step-size Correction P A S : • Based on the quality gain analysis [Akimoto 2017] The optimal step-size depends on the population size • A practical step-size adaptation in the CMA-ES usually well follows the optimal value [Krause 2017] • It implies that the step-size is increased when the population size increases, and vice versa. • The step-size adaptation is corrupted by the population size adaptation. After updating the population size… σ ( t +1) ← σ ( t +1) ⋅ σ *( λ ( t +1) ) c ( λ ) ⋅ n ⋅ μ w ( λ ) σ *( λ ) = n − 1 + c ( λ ) 2 ⋅ μ w ( λ ) σ *( λ ( t ) ) c ( λ ) = − ∑ λ i =1 𝔽 [ 𝒪 i : λ ]
11 PSA-CMA-ES P A S Step1 Sample Step1 Sample Step1 Sample Step1 Sample Step1 Sample 1. An iteration of CMA-ES Step2 Rank Step2 Rank Step2 Rank Step2 Rank Step2 Rank 4 2 3 1 A step in the parameter space Step3 Estimate Step3 Estimate Step3 Estimate Step3 Estimate Step3 Estimate 6 5 Δ θ = [ Δ m , ΔΣ ] Step4 Update Step4 Update Step4 Update Step4 Update Step4 Update Δ m = m ( t +1) − m ( t ) 𝒪 ( m ( t ) , ( σ ( t ) ) 2 C ( t ) ) ΔΣ = ( σ ( t +1) ) 2 C ( t +1) − ( σ ( t ) ) 2 C ( t ) 𝒪 ( m ( t +1) , ( σ ( t +1) ) 2 C ( t +1) ) 2. Update the evolution path and the population size λ ( t +1) ← λ ( t ) exp β ( γ ( t +1) − ∥ p ( t +1) ) ∥ 2 ← ( 1 − β ) p ( t ) θ p ( t +1) θ θ 1 α θ ( t ) Δ θ ( t +1) ℐ 2 β ( 2 − β ) + 1 𝔽 [ ∥ℐ θ ( t ) Δ θ ( t +1) ∥ 2 ] 2 3. Correct the step-size σ ( t +1) ← σ *( λ ( t +1) ) σ *( λ ( t ) ) σ ( t +1)
12 P A Restart Strategy for PSA-CMA-ES PSA-CMA-ES S First run: CMA-ES with the default population size ( σ (0) = 2) → unimodal functions Second run: PSA-CMA-ES ( σ (0) = 2) Max population size → well-structured multimodal λ max = 2 9 ⋅ λ def Additional runs: PSA-CMA-ES with a relatively small step-size σ (0) = 2 ⋅ 10 − 2 ⋅𝒱 [0,1] → weakly-structured multimodal functions Simple Restart All runs: PSA-CMA-ES ( σ (0) = 2, λ max = ∞ )
13 Simulation Common Setting m (0) ∼ 𝒱 [4,4) D • Initialization: : problem dimension ( D ) • Termination: • The target function value is reached 10 6 ⋅ D • The number of evaluation is over • One of the termination conditions [Hansen 2009] is satisfied Algorithm Variants PSA: PSA-CMA-ES with the simple restart PSAwRS: PSA-CMA-ES with the proposed restart strategy BIPOP: BIPOP-CMA-ES [Hansen 2009]
14 Overall Performance (f1-f24) 5D 10D BIPOP PSAwRS PSA 20D 40D
15 Unimodal Functions PSA PSAwRS BIPOP 20D λ f best λ = 10 2 λ default Number of Iteration
16 Unimodal Functions PSA PSAwRS BIPOP 1 Sphere λ median 10 2 λ default 10 1 Dimension 2 3 5 10 20 40
17 Well-structured Multimodal Functions PSA PSAwRS BIPOP 20D f best λ λ = 10 2 λ default Number of Iteration
18 Repetitive Multimodal Functions PSA PSAwRS BIPOP ( σ (0) = 2) 20D λ = 10 5 λ f best λ = 10 2 λ default Number of Iteration
19 Repetitive Multimodal Functions PSA PSAwRS BIPOP ( σ (0) = 2/100) 20D f best λ = 10 2 λ λ default Number of Iteration
20 Summary • PSA-CMA-ESwRS is comparable with BIPOP-CMA-ES. On unimodal functions • PSA-CMA-ES performs worse as dimension gets greater. On well-structured multimodal functions • PSA-CMA-ES works better than BIPOP-CMA-ES. On repetitive multimodal functions • An initial step-size is important to avoid inefficient increase of the population size. Future Work • To investigate the hyper-parameter setting
Recommend
More recommend