Machine learning and black-box expensive optimization S ebastien - PowerPoint PPT Presentation

Introduction Learning for optimization Optimization for learning Machine learning and black-box expensive optimization S´ ebastien Verel Laboratoire d’Informatique, Signal et Image de la Cˆ ote d’opale (LISIC) Universit´ e du Littoral Cˆ ote d’Opale, Calais, France http://www-lisic.univ-littoral.fr/~verel/ June, 18th, 2018

Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc.

Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc. Learning : Minimize an error function M θ : model to learn on data Search θ ⋆ = arg min θ Error ( M θ , data )

Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc. Learning : Minimize an error function M θ : model to learn on data Search θ ⋆ = arg min θ Error ( M θ , data ) According to the model dimension, variables, error function, etc., huge number of optimization algorithms

Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc. Learning : Minimize an error function M θ : model to learn on data Search θ ⋆ = arg min θ Error ( M θ , data ) According to the model dimension, variables, error function, etc., huge number of optimization algorithms Optimization : Learn a design algorithm of good solutions A θ : search algorithm for problems ( X , f ) Learn A θ such that A θ ( X , f ) = arg min x ∈ X f ( x )

Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc. Learning : Minimize an error function M θ : model to learn on data Search θ ⋆ = arg min θ Error ( M θ , data ) According to the model dimension, variables, error function, etc., huge number of optimization algorithms Optimization : Learn a design algorithm of good solutions A θ : search algorithm for problems ( X , f ) Learn A θ such that A θ ( X , f ) = arg min x ∈ X f ( x ) According to the class of algorithms, search spaces, functions, etc., huge number of learning algorithms

Introduction Learning for optimization Optimization for learning AI : Machine Learning, Optimization, perception, etc. Learning : Minimize an error function M θ : model to learn on data Search θ ⋆ = arg min θ Error ( M θ , data ) According to the model dimension, variables, error function, etc., huge number of optimization algorithms Optimization : Learn a design algorithm of good solutions A θ : search algorithm for problems ( X , f ) Learn A θ such that A θ ( X , f ) = arg min x ∈ X f ( x ) According to the class of algorithms, search spaces, functions, etc., huge number of learning algorithms Artificial : from paper to computer !

Introduction Learning for optimization Optimization for learning Black-box (expensive) optimization x − → − → f ( x ) No information on the objective definition function f Objective fonction : can be irregular, non continuous, non differentiable, etc. given by a computation or an (expensive) simulation Few examples from the team : • Mobility simulation (Florian Leprˆ etre), • Plant’s biology, plant growth (Amaury Dubois), • Logistic simulation (Brahim Aboutaib), • Cellular automata, • Nuclear power plant (Valentin Drouet),

Introduction Learning for optimization Optimization for learning Real-world black-box expensive optimization PhD of Mathieu Muniglia, 2014-2017, Valentin Drouet, 2017-2020, CEA, Paris x − → − → f ( x ) (73 , . . . , 8) − → − → ∆ z P Multi-physic simulator Expensive optimization : parallel computing, and surrogate model.

Introduction Learning for optimization Optimization for learning Adaptive distributed optimization algorithms Christopher Jankee, Bilel Derbel, Cyril Fonlupt Portfolio of algorithms : Control of algorithm during optimization

Introduction Learning for optimization Optimization for learning Adaptive distributed optimization algorithms Christopher Jankee, Bilel Derbel, Cyril Fonlupt Portfolio of algorithms : Control of algorithm during optimization How to select an algorithm ? Design reinforcement learning methods for distributed computing ( ǫ -greedy, adapt. pursuit, UCB, ...) 8 5 How to compute a reward ? 4 Aggregation function of local 3 2 4 rewards (mean, max, etc.) for a global selection

Introduction Learning for optimization Optimization for learning Adaptive distributed optimization algorithms Christopher Jankee, Bilel Derbel, Cyril Fonlupt Portfolio of algorithms : Control of algorithm during optimization 8 How to select an algorithm ? 5 4 Design reinforcement learning 3 methods for distributed computing 2 4 ( ǫ -greedy, adapt. pursuit, UCB, ...) How to compute a reward ? Methodology Aggregation function of local Use designed benchmark rewards (mean, max, etc.) for a functions with designed global selection properties and experimental analysis

Introduction Learning for optimization Optimization for learning Features to learn : Mult.-Obj. fitness landscape K. Tanaka, H. Aguirre (Univ. Shinshu), A. Liefooghe, B. Derbel (univ. Lille), 2010 - 2018 Fitness landscape : ( X , f , N ), Search space, obj. func., neighborhood relation 0.8 0.8 0.8 0.7 0.7 0.7 0.6 0.6 0.6 Objective 2 Objective 2 Objective 2 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Objective 1 Objective 1 Objective 1 conflicting objectives independent objectives correlated objectives Kendall's tau Count 40 0 −1 0 1 Value f_cor_rws Perf. prediction (cross-val.) rho #lsupp_avg_rws #lsupp_avg_aws #lnd_avg_rws R 2 #lnd_avg_aws feature set MAE MSE rank length_aws #sup_avg_aws #sup_avg_rws GSEMO #inc_avg_rws #inf_avg_rws all 0.007781 0.000118 0.951609 1 #inc_avg_aws #inf_avg_aws hv_r1_rws enumeration 0.008411 0.000142 0.943046 2 #sup_r1_rws #inf_r1_rws sampling all 0.009113 0.000161 0.932975 3 #lnd_r1_rws nhv_r1_rws sampling rws 0.009284 0.000167 0.930728 4 hvd_r1_rws #inc_r1_rws k_n sampling aws 0.010241 0.000195 0.917563 5 #lsupp_r1_rws n { r , m , n , k / n } 0.010609 0.000215 0.911350 6 hvd_avg_rws hvd_avg_aws hv_avg_aws { r , m , n } 0.026974 0.001123 0.518505 7 hv_avg_rws m { m , n } 0.032150 0.001545 0.340715 8 nhv_avg_rws nhv_avg_aws f_cor_rws rho #lsupp_avg_rws #lsupp_avg_aws #lnd_avg_rws #lnd_avg_aws length_aws #sup_avg_aws #sup_avg_rws #inc_avg_rws #inf_avg_rws #inc_avg_aws #inf_avg_aws hv_r1_rws #sup_r1_rws #inf_r1_rws #lnd_r1_rws nhv_r1_rws hvd_r1_rws #inc_r1_rws k_n #lsupp_r1_rws n hvd_avg_rws hvd_avg_aws hv_avg_aws hv_avg_rws m nhv_avg_rws nhv_avg_aws

Introduction Learning for optimization Optimization for learning Learning/tuning parameters according to features

Introduction Learning for optimization Optimization for learning Surrogate model for pseudo-boolean functions Goal : Replace/learn the (expensive) objective function with a (sheep) meta-model during the optimization process Continuous optimization : NN, Gaussian Process (krigging), EGO : sample the next solution with max. expected improvement GP : Random variables which have joint Gaussian distribution. mean : m ( y ( x )) = µ covariance : cov ( y ( x ) , y ( x ′ )) = exp ( − θ d ( x , x ′ ) p ) from : Rasmussen, Williams, GP for ML, MIT Press, 2006.

Introduction Learning for optimization Optimization for learning Surrogate model for pseudo-boolean functions Goal : Replace/learn the (expensive) objective function with a (sheep) meta-model during the optimization process Continuous optimization : NN, Gaussian Process (krigging), EGO : sample the next solution with max. expected improvement Proposition � n − 1 j =0 k j x j Walsh function basis : ∀ x ∈ { 0 , 1 } n , ϕ k ( x ) = ( − 1) 2 n − 1 � � with w k = 1 f ( x ) = w k .ϕ k ( x ) f ( x ) .ϕ k ( x ) 2 n k =0 x ∈{ 0 , 1 } n Surrogate model : � ● method ● ˆ 0.04 ● kriging f ( x ) = w k .ϕ k ( x ) � ● ● Mean Abs. Error of fitness ● walsh ● ● ● ● ● 0.03 ● ● ● k : o ( ϕ k ) � d ● ● ● ● ● ● ● ● ● ● 0.02 ● ● ● ● ● ● ● ● Estimate the coefficients ● ● ● ● ● ● ● ● 0.01 ● ● ● ● ● with LARS ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 100 200 300 400 500 Sample size

Machine learning and black-box expensive optimization S ebastien - PowerPoint PPT Presentation

Introduction Learning for optimization Optimization for learning Machine learning and black-box expensive optimization S ebastien Verel Laboratoire dInformatique, Signal et Image de la C ote dopale (LISIC) Universit e du

Paradoxes in Probability How probability continues to amuse me! Let's play a game! Box A Box B

Black-box expensive optimization: Learn to optimize S ebastien Verel Laboratoire

Efficient Black-Box Combinatorial Optimization Hamid Dadkhahi Karthikeyan Shanmugam Jesus Rios

A recipe for black box functors Maru Sarazola and Brendan Fong What is a black box functor? In

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

Flux Box Flux Box A concept by Flux Laboratory Flux box : concept Flux box : concept What is Flux

Kid s Box American English Level 1 Presentation Plus: Kid s Box American English Kid s Box

[7] Gaussian Elimination Starting to peek inside the black box So far sol ve( A, b) is a black

A Black-Box Discrete Optimization Benchmarking (BB-DOB) Pipeline Survey: Taxonomy, Evaluation,

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

PRETZEL:Opening the Black Box of Machine Learning Prediction Serving Systems Presented by

Volatility is Rough, Part 2: Pricing Jim Gatheral (joint work with Christian Bayer, Peter Friz,

Deep Learning: From Theory to Algorithm Outline: 1. Overview of

Neural Networks: Computation + Gradient Descent LING572 Advanced Statistical Methods in NLP

Workshop 10.4: Generalized linear models Murray Logan February 15, 2017 Table of contents 1

Estimation of Transformations Shao-Yi Chien Department of Electrical Engineering

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

Kolmogorov-Chaitin Complexity of Linear Digital Controllers Implemented using Fixed-point

Learning and Imbalanced Data January 28, 2019 David Rimshnick Data Science in the Wild, Spring