Adaptive Operator Selection for Optimization ´ Alvaro Fialho Advisors: Marc Schoenauer & Mich` ele Sebag Ph.D. Defense ´ Ecole Doctorale d’Informatique Universit´ e Paris-Sud, Orsay, France December 22, 2010
Context Operator Selection Credit Assignment Empirical Validation Conclusion Outline Context & Motivation 1 Operator Selection 2 Credit Assignment 3 Empirical Validation 4 Conclusions & Further Work 5 ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 2/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Context & Motivation Context & Motivation 1 Evolutionary Algorithms Parameter Setting in EAs Parameter Setting of Variation Operators Adaptive Operator Selection Operator Selection 2 Credit Assignment 3 Empirical Validation 4 Conclusions & Further Work 5 ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 3/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Evolutionary Algorithms Stochastic optimization algorithms (Darwinian paradigm) Bottleneck: parameter setting Population size and number of offspring generated Parameters of selection and replacement methods Parameters of Variation Operators (application rate, etc) Goal: Automatic parameter setting (Crossing the Chasm) ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 4/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Evolutionary Algorithms Stochastic optimization algorithms (Darwinian paradigm) Bottleneck: parameter setting Population size and number of offspring generated Parameters of selection and replacement methods Parameters of Variation Operators (application rate, etc) Goal: Automatic parameter setting (Crossing the Chasm) ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 4/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Parameter Setting in EAs [Eiben et al., 2007] ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 5/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Parameter Setting of Variation Operators Difficult to predict the performance Problem-dependent and inter-dependent choices Off-line tuning can find the best static strategy (expensive) Performance of operators on OneMax 5 1-Bit 3-Bit 4 5-Bit 1/n BitFlip Depends also on... 3 Fitness of the parents 2 Pop. fitness distribution 1 (sample fig. with a (1+50)-EA) 0 1000 3000 5000 7000 9000 fitness of the parent ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 6/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Parameter Setting of Variation Operators Difficult to predict the performance Problem-dependent and inter-dependent choices Off-line tuning can find the best static strategy (expensive) Performance of operators on OneMax 5 1-Bit 3-Bit 4 5-Bit 1/n BitFlip Depends also on... 3 Fitness of the parents 2 Pop. fitness distribution 1 (sample fig. with a (1+50)-EA) 0 1000 3000 5000 7000 9000 fitness of the parent ⇒ Should be adapted on-line, while solving the problem ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 6/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Parameter Setting in EAs [Eiben et al., 2007] ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 7/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Adaptive Operator Selection Position of the Problem Given a set of K variation operators Select on-line the operator to be applied next Based on their recent performance EA AOS quality op1 Operator credit or operator Operator quality op2 reward Application Selection . . . quality opk Impact Credit impact Evaluation Assignment ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 8/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Operator Selection Context & Motivation 1 Operator Selection 2 Related Work Discussion on Operator Selection A (kind of) Multi-Armed Bandit problem Dynamic Multi-Armed Bandit (DMAB) Sliding Multi-Armed Bandit (SLMAB) Contributions to Operator Selection: Summary Credit Assignment 3 Empirical Validation 4 Conclusions & Further Work 5 ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 9/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Operator Selection - Related Work “Empirical quality”: ˆ q j , t +1 = (1 − α ) · ˆ q j , t + α · r j , t Probability Matching (PM) [Goldberg, 1990] s i proportional to ˆ q i ˆ q i , t +1 s i , t +1 = p min + (1 − K · p min ) · � K j =1 ˆ q j , t +1 Adaptive Pursuit (AP) [Thierens, 2005] s i ∗ is pushed to p max ; others to p min i ∗ = argmax { ˆ q i , t , i = 1 . . . K } s i ∗ , t +1 = s i ∗ , t + β · ( p max − s i ∗ , t ) , = s i , t + β · ( p min − s i , t ) , for i � = i ∗ s i , t +1 ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 10/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Discussion on Operator Selection Exploration versus Exploitation In operators search space, not problem search space Acquire new information (use other operators) vs. Capitalize on the available knowledge (use current best) Probability-based Methods (PM and AP) Conservative approach: fixed p min Entails over-exploration when many operators EvE balance ⇒ Game Theory: Multi-Armed Bandits Level of exploration depends on confidence about knowledge i.e. , p min should be “dynamic” ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 11/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion A (kind of) Multi-Armed Bandit problem Original Multi-Armed Bandits (Machine Learning - ML) Given K arms ( ≡ operators) At time t , gambler plays arm j and gets r j , t = 1 with (unknown) prob. p j r j , t = 0 otherwise Goal : maximize cumulative reward ≡ minimize regret T � ( r ∗ L ( T ) = t − r t ) t =1 ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 12/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion The Upper Confidence Bound MAB algorithm Assymptotic optimality guarantees (static context) [Auer et al., 2002] Optimal L ( T ) = O (log T ) At time t , choose arm i maximizing: � 2 log � k n k , t = ˆ + score i , t q i , t n i , t ���� exploitation � �� � exploration with n i , t +1 = n i , t + 1 # times � � 1 1 and ˆ q i , t +1 = 1 − · ˆ q i , t + n i , t +1 · r i , t emp. qual. n i , t +1 Efficiency comes from optimal EvE balance Interval between exploration trials increases exponentially w.r.t. # time steps ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 13/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Operator Selection with UCB: shortcomings Exploration vs. Exploitation (EvE) balance Original MAB: rewards ∈ { 0 , 1 } ; AOS: rewards ∈ [ a , b ] (e.g., fitness improvement) UCB’s EvE balance is broken, Scaling is needed: � 2 log P k n k , t score i , t = ˆ q i , t + C n i , t Dynamics When op i is not the best anymore . . . � � 1 1 q i , t +1 = ˆ 1 − · ˆ q i , t + n i , t +1 · r i , t n i , t +1 Weight of r is inversely proportional to n Adjusting ˆ q ’s after a change takes a long time ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 14/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Dynamic Multi-Armed Bandit (DMAB) Rationale No need for exploration in stationary situations ⇒ Upon the detection of a change, restart the MAB. How to detect a change in a distribution? Page-Hinkley statistical test [Page, 1954] r t = 1 P t ¯ i =1 r i 1 t m t = P t i =1 ( r i − ¯ r i + δ ), 2 M t = max {| m i | , i = 1 . . . t } 3 Return ( M t − | m t | > γ ) 4 ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 15/46
Context Operator Selection Credit Assignment Empirical Validation Conclusion Dynamic Multi-Armed Bandit (DMAB) Rationale No need for exploration in stationary situations ⇒ Upon the detection of a change, restart the MAB. How to detect a change in a distribution? Page-Hinkley statistical test [Page, 1954] r t = 1 P t ¯ i =1 r i 1 t m t = P t i =1 ( r i − ¯ r i + δ ), 2 M t = max {| m i | , i = 1 . . . t } 3 Return ( M t − | m t | > γ ) 4 DMAB = UCB + Scaling + Page-Hinkley ´ Alvaro Fialho – Ph.D. Defense – December 22, 2010 Adaptive Operator Selection for Optimization 15/46
Recommend
More recommend