DTS-CMA-ES Surrogate models Experimental results Comparison of Ordinal and Metric Gaussian Process Regression as Surrogate Models for CMA Evolution Strategy ek Pitra 1 , 2 , 3 , Lukáš Bajer 1 , 4 , Jakub Repický 1 , 4 , Zbynˇ na 1 Martin Holeˇ 1 Institute of Computer Science, Czech Academy of Sciences 2 Faculty of Nuclear Sciences and Physical Engineering 3 National Institute of Mental Health 4 Faculty of Mathematics and Physics, Charles University Prague, Czech Republic GECCO 2017 Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 1
DTS-CMA-ES Surrogate models Experimental results Contents DTS-CMA-ES 1 Surrogate models 2 Metric Gaussian Processes Ordinal Gaussian Processes Experimental results 3 Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 2
DTS-CMA-ES Surrogate models Experimental results DTS-CMA-ES Initialize : standard CMA-ES initialization with population doubled while not terminate CMA-ES sampling of population x i ∼ N ( m , σ 2 C ) , for i = 1 , . . . , λ 1 CMA-ES m 1 , σ 1 sampling from N ( m 1 , σ 1 ) 1 Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3
DTS-CMA-ES Surrogate models Experimental results DTS-CMA-ES Initialize : standard CMA-ES initialization with population doubled while not terminate CMA-ES sampling of population x i ∼ N ( m , σ 2 C ) , for i = 1 , . . . , λ 1 train the first model f M 1 on the so-far original-evaluated points 2 m 1 , σ 1 1 st model training 2 Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3
DTS-CMA-ES Surrogate models Experimental results DTS-CMA-ES Initialize : standard CMA-ES initialization with population doubled while not terminate CMA-ES sampling of population x i ∼ N ( m , σ 2 C ) , for i = 1 , . . . , λ 1 train the first model f M 1 on the so-far original-evaluated points 2 s 2 get mean ˆ µ i and variance ˆ i of all x i with the model f M 1 3 s 2 m 1 , σ 1 distribution prediction 3 according to 1 st model Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3
DTS-CMA-ES Surrogate models Experimental results DTS-CMA-ES Initialize : standard CMA-ES initialization with population doubled while not terminate CMA-ES sampling of population x i ∼ N ( m , σ 2 C ) , for i = 1 , . . . , λ 1 train the first model f M 1 on the so-far original-evaluated points 2 s 2 get mean ˆ µ i and variance ˆ i of all x i with the model f M 1 3 select the most promising ⌈ αλ ⌉ points accord. to the model f M 1 4 s 2 3rd 3rd m 1 , σ 1 1st 1st 2nd 2nd criterion ranking 4 according to 1 st model Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3
DTS-CMA-ES Surrogate models Experimental results DTS-CMA-ES Initialize : standard CMA-ES initialization with population doubled while not terminate CMA-ES sampling of population x i ∼ N ( m , σ 2 C ) , for i = 1 , . . . , λ 1 train the first model f M 1 on the so-far original-evaluated points 2 s 2 get mean ˆ µ i and variance ˆ i of all x i with the model f M 1 3 select the most promising ⌈ αλ ⌉ points accord. to the model f M 1 4 evaluate the chosen points 5 with the original fitness f fitness evaluation m 1 , σ 1 of a few chosen points 5 Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3
DTS-CMA-ES Surrogate models Experimental results DTS-CMA-ES Initialize : standard CMA-ES initialization with population doubled while not terminate CMA-ES sampling of population x i ∼ N ( m , σ 2 C ) , for i = 1 , . . . , λ 1 train the first model f M 1 on the so-far original-evaluated points 2 s 2 get mean ˆ µ i and variance ˆ i of all x i with the model f M 1 3 select the most promising ⌈ αλ ⌉ points accord. to the model f M 1 4 evaluate the chosen points 5 with the original fitness f re-train the second model f M 2 6 with these new points 2 nd model m 1 , σ 1 training 6 Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3
DTS-CMA-ES Surrogate models Experimental results DTS-CMA-ES Initialize : standard CMA-ES initialization with population doubled while not terminate CMA-ES sampling of population x i ∼ N ( m , σ 2 C ) , for i = 1 , . . . , λ 1 train the first model f M 1 on the so-far original-evaluated points 2 s 2 get mean ˆ µ i and variance ˆ i of all x i with the model f M 1 3 select the most promising ⌈ αλ ⌉ points accord. to the model f M 1 4 evaluate the chosen points 5 with the original fitness f re-train the second model f M 2 6 2 nd model with these new points mean-prediction m 1 , σ 1 for the rest of predict the fitness for the 7 population non-original-evaluated points 7 with f M 2 Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3
DTS-CMA-ES Surrogate models Experimental results DTS-CMA-ES Initialize : standard CMA-ES initialization with population doubled while not terminate CMA-ES sampling of population x i ∼ N ( m , σ 2 C ) , for i = 1 , . . . , λ 1 train the first model f M 1 on the so-far original-evaluated points 2 s 2 get mean ˆ µ i and variance ˆ i of all x i with the model f M 1 3 select the most promising ⌈ αλ ⌉ points accord. to the model f M 1 4 evaluate the chosen points 5 with the original fitness f re-train the second model f M 2 6 with these new points m, σ, C CMA-ES m 2 , σ 2 predict the fitness for the 7 update non-original-evaluated points 8 with f M 2 CMA-ES update of m , σ , C 8 Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 3
DTS-CMA-ES Metric Gaussian Processes Surrogate models Ordinal Gaussian Processes Experimental results Gaussian Process GP is a stochastic process, where any finite collection of random variables has a joint Gaussian distribution f GP ( x ) ∼ GP ( µ ( x ) , k ( x 1 , x 2 )) Defined by the mean function µ ( x ) (usually constant) and covariance function k ( x 1 , x 2 ) and their (hyper)parameters Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 4
DTS-CMA-ES Metric Gaussian Processes Surrogate models Ordinal Gaussian Processes Experimental results Gaussian Process GP is a stochastic process, where any finite collection of random variables has a joint Gaussian distribution f GP ( x ) ∼ GP ( µ ( x ) , k ( x 1 , x 2 )) Defined by the mean function µ ( x ) (usually constant) and covariance function k ( x 1 , x 2 ) and their (hyper)parameters GP can express uncertainty of the prediction in a new point x : it gives a probability distribution of the output value Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 4
DTS-CMA-ES Metric Gaussian Processes Surrogate models Ordinal Gaussian Processes Experimental results Gaussian Process given a set of N training points X N = ( x 1 . . . x N ) , x i ∈ R d , and corresponding measured values y N = ( y 1 , . . . , y N ) ⊤ of a function f being approximated y i = f ( x i ) , i = 1 , . . . , N Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 5
DTS-CMA-ES Metric Gaussian Processes Surrogate models Ordinal Gaussian Processes Experimental results Gaussian Process given a set of N training points X N = ( x 1 . . . x N ) , x i ∈ R d , and corresponding measured values y N = ( y 1 , . . . , y N ) ⊤ of a function f being approximated y i = f ( x i ) , i = 1 , . . . , N GP considers vector of these function values as a sample from N -variate Gaussian distribution y N ∼ N ( 0 , C N ) Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 5
DTS-CMA-ES Metric Gaussian Processes Surrogate models Ordinal Gaussian Processes Experimental results Gaussian Process prediction When considering a new point ( x ∗ , y ∗ ) , the prob. density of its f -values is 1D Gaussian p ( y ∗ | X N , x ∗ , y N ) ∼ N (ˆ s 2 N + 1 ) µ N + 1 , ˆ Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 6
DTS-CMA-ES Metric Gaussian Processes Surrogate models Ordinal Gaussian Processes Experimental results Gaussian Process prediction When considering a new point ( x ∗ , y ∗ ) , the prob. density of its f -values is 1D Gaussian p ( y ∗ | X N , x ∗ , y N ) ∼ N (ˆ s 2 N + 1 ) µ N + 1 , ˆ with the mean and variance given by k ⊤ C N − 1 y N , ˆ = µ N + 1 s 2 N + 1 κ − k ⊤ C N − 1 k = where C N is GP covariance matrix – matrix of covariance function’s values k ( x i , x j ) for each pair x i , x j k is vector of covariance function’s values k ( x ∗ , x i ) between the new point x ∗ and x i ∈ X N κ is the variance of the new point itself k ( x ∗ , x ∗ ) Z Pitra, L Bajer, J Repický, M Holeˇ na Comparison Ordinal vs. Metric GP for CMA-ES 6
Recommend
More recommend