machine learning and black box expensive optimization
play

Machine learning and black-box expensive optimization S ebastien - PowerPoint PPT Presentation

Introduction Learning for optimization Optimization for learning Machine learning and black-box expensive optimization S ebastien Verel Laboratoire dInformatique, Signal et Image de la C ote dopale (LISIC) Universit e du


  1. Introduction Learning for optimization Optimization for learning Machine learning and black-box expensive optimization S´ ebastien Verel Laboratoire d’Informatique, Signal et Image de la Cˆ ote d’opale (LISIC) Universit´ e du Littoral Cˆ ote d’Opale, Calais, France http://www-lisic.univ-littoral.fr/~verel/ June, 18th, 2018

  2. Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc.

  3. Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc. Learning : Minimize an error function M θ : model to learn on data Search θ ⋆ = arg min θ Error ( M θ , data )

  4. Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc. Learning : Minimize an error function M θ : model to learn on data Search θ ⋆ = arg min θ Error ( M θ , data ) According to the model dimension, variables, error function, etc., huge number of optimization algorithms

  5. Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc. Learning : Minimize an error function M θ : model to learn on data Search θ ⋆ = arg min θ Error ( M θ , data ) According to the model dimension, variables, error function, etc., huge number of optimization algorithms Optimization : Learn a design algorithm of good solutions A θ : search algorithm for problems ( X , f ) Learn A θ such that A θ ( X , f ) = arg min x ∈ X f ( x )

  6. Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc. Learning : Minimize an error function M θ : model to learn on data Search θ ⋆ = arg min θ Error ( M θ , data ) According to the model dimension, variables, error function, etc., huge number of optimization algorithms Optimization : Learn a design algorithm of good solutions A θ : search algorithm for problems ( X , f ) Learn A θ such that A θ ( X , f ) = arg min x ∈ X f ( x ) According to the class of algorithms, search spaces, functions, etc., huge number of learning algorithms

  7. Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc. Learning : Minimize an error function M θ : model to learn on data Search θ ⋆ = arg min θ Error ( M θ , data ) According to the model dimension, variables, error function, etc., huge number of optimization algorithms Optimization : Learn a design algorithm of good solutions A θ : search algorithm for problems ( X , f ) Learn A θ such that A θ ( X , f ) = arg min x ∈ X f ( x ) According to the class of algorithms, search spaces, functions, etc., huge number of learning algorithms Artificial : from paper to computer !

  8. Introduction Learning for optimization Optimization for learning Black-box (expensive) optimization x − → − → f ( x ) No information on the objective definition function f Objective fonction : can be irregular, non continuous, non differentiable, etc. given by a computation or an (expensive) simulation Few examples from the team : • Mobility simulation (Florian Leprˆ etre), • Plant’s biology, plant growth (Amaury Dubois), • Logistic simulation (Brahim Aboutaib), • Cellular automata, • Nuclear power plant (Valentin Drouet),

  9. Introduction Learning for optimization Optimization for learning Real-world black-box expensive optimization PhD of Mathieu Muniglia, 2014-2017, Valentin Drouet, 2017-2020, CEA, Paris x − → − → f ( x ) (73 , . . . , 8) − → − → ∆ z P Multi-physic simulator Expensive optimization : parallel computing, and surrogate model.

  10. Introduction Learning for optimization Optimization for learning Adaptive distributed optimization algorithms Christopher Jankee, Bilel Derbel, Cyril Fonlupt Portfolio of algorithms : Control of algorithm during optimization

  11. Introduction Learning for optimization Optimization for learning Adaptive distributed optimization algorithms Christopher Jankee, Bilel Derbel, Cyril Fonlupt Portfolio of algorithms : Control of algorithm during optimization How to select an algorithm ? Design reinforcement learning methods for distributed computing ( ǫ -greedy, adapt. pursuit, UCB, ...) 8 5 How to compute a reward ? 4 Aggregation function of local 3 2 4 rewards (mean, max, etc.) for a global selection

  12. Introduction Learning for optimization Optimization for learning Adaptive distributed optimization algorithms Christopher Jankee, Bilel Derbel, Cyril Fonlupt Portfolio of algorithms : Control of algorithm during optimization 8 How to select an algorithm ? 5 4 Design reinforcement learning 3 methods for distributed computing 2 4 ( ǫ -greedy, adapt. pursuit, UCB, ...) How to compute a reward ? Methodology Aggregation function of local Use designed benchmark rewards (mean, max, etc.) for a functions with designed global selection properties and experimental analysis

  13. Introduction Learning for optimization Optimization for learning Features to learn : Mult.-Obj. fitness landscape K. Tanaka, H. Aguirre (Univ. Shinshu), A. Liefooghe, B. Derbel (univ. Lille), 2010 - 2018 Fitness landscape : ( X , f , N ), Search space, obj. func., neighborhood relation 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 Objective 2 Objective 2 Objective 2 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Objective 1 Objective 1 Objective 1 conflicting objectives independent objectives correlated objectives Kendall's tau Count 40 0 −1 0 1 Value f_cor_rws Perf. prediction (cross-val.) rho #lsupp_avg_rws #lsupp_avg_aws #lnd_avg_rws R 2 #lnd_avg_aws feature set MAE MSE rank length_aws #sup_avg_aws #sup_avg_rws GSEMO #inc_avg_rws #inf_avg_rws all 0.007781 0.000118 0.951609 1 #inc_avg_aws #inf_avg_aws hv_r1_rws enumeration 0.008411 0.000142 0.943046 2 #sup_r1_rws #inf_r1_rws sampling all 0.009113 0.000161 0.932975 3 #lnd_r1_rws nhv_r1_rws sampling rws 0.009284 0.000167 0.930728 4 hvd_r1_rws #inc_r1_rws k_n sampling aws 0.010241 0.000195 0.917563 5 #lsupp_r1_rws n { r , m , n , k / n } 0.010609 0.000215 0.911350 6 hvd_avg_rws hvd_avg_aws hv_avg_aws { r , m , n } 0.026974 0.001123 0.518505 7 hv_avg_rws m { m , n } 0.032150 0.001545 0.340715 8 nhv_avg_rws nhv_avg_aws f_cor_rws rho #lsupp_avg_rws #lsupp_avg_aws #lnd_avg_rws #lnd_avg_aws length_aws #sup_avg_aws #sup_avg_rws #inc_avg_rws #inf_avg_rws #inc_avg_aws #inf_avg_aws hv_r1_rws #sup_r1_rws #inf_r1_rws #lnd_r1_rws nhv_r1_rws hvd_r1_rws #inc_r1_rws k_n #lsupp_r1_rws n hvd_avg_rws hvd_avg_aws hv_avg_aws hv_avg_rws m nhv_avg_rws nhv_avg_aws

  14. Introduction Learning for optimization Optimization for learning Learning/tuning parameters according to features

  15. Introduction Learning for optimization Optimization for learning Learning/tuning parameters according to features

  16. Introduction Learning for optimization Optimization for learning Surrogate model for pseudo-boolean functions Goal : Replace/learn the (expensive) objective function with a (sheep) meta-model during the optimization process Continuous optimization : NN, Gaussian Process (krigging), EGO : sample the next solution with max. expected improvement GP : Random variables which have joint Gaussian distribution. mean : m ( y ( x )) = µ covariance : cov ( y ( x ) , y ( x ′ )) = exp ( − θ d ( x , x ′ ) p ) from : Rasmussen, Williams, GP for ML, MIT Press, 2006.

  17. Introduction Learning for optimization Optimization for learning Surrogate model for pseudo-boolean functions Goal : Replace/learn the (expensive) objective function with a (sheep) meta-model during the optimization process Continuous optimization : NN, Gaussian Process (krigging), EGO : sample the next solution with max. expected improvement Proposition � n − 1 j =0 k j x j Walsh function basis : ∀ x ∈ { 0 , 1 } n , ϕ k ( x ) = ( − 1) 2 n − 1 � � with w k = 1 f ( x ) = w k .ϕ k ( x ) f ( x ) .ϕ k ( x ) 2 n k =0 x ∈{ 0 , 1 } n Surrogate model : � ● method ● ˆ 0.04 ● kriging f ( x ) = w k .ϕ k ( x ) � ● ● Mean Abs. Error of fitness ● walsh ● ● ● ● ● 0.03 ● ● ● k : o ( ϕ k ) � d ● ● ● ● ● ● ● ● ● ● 0.02 ● ● ● ● ● ● ● ● Estimate the coefficients ● ● ● ● ● ● ● ● 0.01 ● ● ● ● ● with LARS ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 100 200 300 400 500 Sample size

Recommend


More recommend