Adaptive Operator Selection with Rank-based Multi-Armed Bandits - PowerPoint PPT Presentation

Adaptive Operator Selection with Rank-based Multi-Armed Bandits Alvaro Fialho, Marc Schoenauer & Mich` ele Sebag 26th COW, April 22., 2013

Context Operator Selection Credit Assignment Empirical Validation Conclusion Outline 1 Context & Motivation 2 Operator Selection 3 Credit Assignment 4 Empirical Validation 5 Conclusions & Further Work Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 2

Context Operator Selection Credit Assignment Empirical Validation Conclusion Context & Motivation 1 Context & Motivation Evolutionary Algorithms Adaptive Operator Selection 2 Operator Selection 3 Credit Assignment 4 Empirical Validation 5 Conclusions & Further Work Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 3

Context Operator Selection Credit Assignment Empirical Validation Conclusion Evolutionary Algorithms Stochastic optimization algorithms (Darwinian paradigm) Bottleneck: parameter setting Population size and number of offspring Selection and replacement methods (and their parameters) Variation Operators (application rate, internal parameters) Goal: Automatic setting (Crossing the Chasm) [Moore, 1991] Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 4

Context Operator Selection Credit Assignment Empirical Validation Conclusion Parameter Setting of Variation Operators Difficult to predict the performance Problem-dependent and inter-dependent choices Off-line tuning → best static strategy (expensive) Performance of operators on OneMax 5 1-Bit 3-Bit 4 5-Bit 1/n BitFlip Also depends on . . . 3 Fitness of the parents 2 Pop. fitness distribution 1 (sample fig. with a (1+50)-EA) 0 1000 3000 5000 7000 9000 fitness of the parent Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 5

Context Operator Selection Credit Assignment Empirical Validation Conclusion Parameter Setting of Variation Operators Difficult to predict the performance Problem-dependent and inter-dependent choices Off-line tuning → best static strategy (expensive) Performance of operators on OneMax 5 1-Bit 3-Bit 4 5-Bit 1/n BitFlip Also depends on . . . 3 Fitness of the parents 2 Pop. fitness distribution 1 (sample fig. with a (1+50)-EA) 0 1000 3000 5000 7000 9000 fitness of the parent ⇒ Should be adapted on-line, while solving the problem Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 5

Context Operator Selection Credit Assignment Empirical Validation Conclusion Adaptive Operator Selection Position of the Problem Given a set of K variation operators Select on-line the operator to be applied next Based on their recent effects EA AOS quality op1 Operator credit or operator Operator quality op2 reward Application Selection . . . quality opk Impact Credit impact Evaluation Assignment Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 6

Context Operator Selection Credit Assignment Empirical Validation Conclusion Operator Selection 1 Context & Motivation 2 Operator Selection A Multi-Armed Bandit problem Operator Selection: Discussion 3 Credit Assignment 4 Empirical Validation 5 Conclusions & Further Work Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 7

Context Operator Selection Credit Assignment Empirical Validation Conclusion A (kind of) Multi-Armed Bandit problem The Basic Multi-Armed Bandit Problem Given K arms ( ≡ operators) At time t , gambler plays arm j and gets r j , t = 1 with (unknown) prob. p j r j , t = 0 with prob. 1 − p j Goal : maximize cumulative reward ≡ minimize regret T � ( r ∗ L ( T ) = t − r t ) t =1 Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 8

Context Operator Selection Credit Assignment Empirical Validation Conclusion The Upper Confidence Bound MAB algorithm Assymptotic optimality guarantees (static context) [Auer et al., 2002] Optimal L ( T ) = O (log T ) At time t , choose arm i maximizing: � 2 log � k n k , t = ˆ + score i , t q i , t n i , t �� exploitation � �� exploration with n i , t +1 = n i , t + 1 # times � � 1 1 and ˆ q i , t +1 = 1 − · ˆ q i , t + n i , t +1 · r i , t emp. qual. n i , t +1 Efficiency comes from optimal EvE balance Interval between exploration trials increases exponentially w.r.t. # time steps Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 9

Context Operator Selection Credit Assignment Empirical Validation Conclusion Operator Selection with UCB: shortcomings Exploration vs. Exploitation (EvE) balance In UCB theory, rewards ∈ { 0 , 1 } ; fitness-based rewards ∈ [ a , b ] UCB’s EvE balance is broken, Scaling is needed: � 2 log � k n k , t score i , t = ˆ q i , t + C n i , t Dynamical setting (best arm/op changes along evolution) Adjusting ˆ q ’s after a change takes a long time Use change detection test (e.g. Page-Hinkley) [Hinkley, 1969] ⇒ Upon the detection of a change, restart the MAB. DMAB = UCB + Scaling + Page-Hinkley Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 10

Context Operator Selection Credit Assignment Empirical Validation Conclusion Operator Selection: Discussion MAB = UCB + Scaling Optimal EvE, but in static setting. . . AOS is dynamic DMAB = MAB + Page-Hinkley change-detection Won Pascal challenge on On-line EvE trade-off [Hartland et al., 2007] Utilization in the AOS context [GECCO’08] 2 hyper-parameters: scaling C and Page-Hinkley threshold γ Very efficient, but very sensitive to hyper-parameter setting Change-detection works only when changes are abrupt An alternative: ’More Dynamic’ Reward Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 11

Context Operator Selection Credit Assignment Empirical Validation Conclusion Credit Assignment 1 Context & Motivation 2 Operator Selection 3 Credit Assignment Fitness-based Rewards Area-Under-the-Curve (AUC) Rank-based AUC with MAB 4 Empirical Validation 5 Conclusions & Further Work Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 12

Context Operator Selection Credit Assignment Empirical Validation Conclusion Fitness-based Rewards Impact of an operator application? Most common: Fitness Improvement ∆ F For multi-modal problems: diversity also important [CEC’09] From Impact to Credit (or reward) Instantaneous (∆ F last application) likely to be unstable Average of the last W applications Extreme value over the last W applications [PPSN’08] Rare extreme events are more important than average e.g. rogue waves, epidemic propagation Issues: High sensitivity to scaling parameters . . . likely to be dynamic, too Higher robustness: Credit Assignment based on Ranks Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 13

Context Operator Selection Credit Assignment Empirical Validation Conclusion Area-Under-the-Curve (AUC) Area Under ROC Curve in ML Evaluation of binary classifiers [Fawcett, 2006] [ + + - - + + + - - - - . . . ] Performance: % of misclassification Equivalent to MannWhitneyWilcoxon test Pr ( rank ( n + ) > rank ( n − )) Area Under ROC Curve in AOS One operator versus others 6 [GECCO’10] operator under assessment (1) 5 [ op 1 , op 2 , op 1 , op 1 , op 1 , op 2 , op 2 , . . . ] 4 3 2 Fitness improvements are ranked 1 0 Size of the segment = assigned rank-value 0 1 2 3 4 5 6 7 8 9 other operators Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 14

Context Operator Selection Credit Assignment Empirical Validation Conclusion Rank-Based AUC R ∆ F Op Op. 2, Step 0 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 9 2.9 2 0 10 2.8 2 0 11 2.5 3 12 2.0 1 13 1.5 0 14 1.0 3 15 0.8 0 Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 15

Context Operator Selection Credit Assignment Empirical Validation Conclusion Rank-Based AUC R ∆ F Op Op. 2, Step 1 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 8 3.0 2 1 9 2.9 2 10 2.8 2 0 11 2.5 3 12 2.0 1 13 1.5 0 14 1.0 3 15 0.8 0 Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 15

Context Operator Selection Credit Assignment Empirical Validation Conclusion Rank-Based AUC R ∆ F Op Op. 2, Step 2 1 5.0 2 2 4.7 2 3 4.2 1 4 3.5 1 5 3.4 2 6 3.3 2 7 3.1 2 2 8 3.0 2 9 2.9 2 10 2.8 2 0 11 2.5 3 12 2.0 1 13 1.5 0 14 1.0 3 15 0.8 0 Fialho, Schoenauer, Sebag Rank-based Adaptive Operator Selection 15

Adaptive Operator Selection with Rank-based Multi-Armed Bandits - PowerPoint PPT Presentation

Adaptive Operator Selection with Rank-based Multi-Armed Bandits Alvaro Fialho, Marc Schoenauer & Mich` ele Sebag 26th COW, April 22., 2013 Context Operator Selection Credit Assignment Empirical Validation Conclusion Outline 1 Context

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Multi-armed Bandits Prof. Kuan-Ting Lai 2020/3/12 k-armed Bandit Problem Playing k armed

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling Instructor: Sham Kakade 1 The

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I S ebastien

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Armed Services Advice Project (ASAP) - A Gateway to Armed Forces Services Championing Partnership

Responding Responding to Armed to Armed Conflict Conflict ILO Crisis Response : Trainers

Communications William Lyn Armed Forces Covenant Team The Armed Forces Covenant Conference

Directorate of Admissions The 5 Branches of the Armed Forces Military Service BY ARMED

Armed Forces Community Covenant Conference Dave Rutter Head, Armed Forces and Veterans Health

Gravity and the planar spin-2 Schr odinger equation Eric Bergshoeff Groningen University work

Catmandu What is it? a Perl library a command line tool to import , transform and

14 Allocation Dirichlet Latent Lecture : Taheri Sara Scribes : Chu 4am Exam Man Tue

Exponential Lower Bounds for Polytopes in Combinatorial Optimization Ronald de Wolf Joint with

Polygon Filling Werner Purgathofer Linked Lists flexible data structure x 1 x 2 x 1 x 2

Adaptive Quiz Generation Using Thompson Sampling Fuhua (Oscar) Lin, PhD Athabasca University,

Multi-agent learning Multi-a rmed bandit algo rithms Gerard Vreeswijk , Intelligent Software

Mac OS X : System Integrity Protection Nicolas RUFF - nruff(at)google(dot)com Proprietary