atm algorithm
play

ATM Algorithm on the BBOB 2009 Noiseless Function Testbed Benjamin - PowerPoint PPT Presentation

Benchmarking The ATM Algorithm on the BBOB 2009 Noiseless Function Testbed Benjamin Bodner Brown University Providence, RI, USA BBOB Workshop GECCO 2019 Prague 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function


  1. Benchmarking The ATM Algorithm on the BBOB 2009 Noiseless Function Testbed Benjamin Bodner Brown University Providence, RI, USA BBOB Workshop GECCO 2019 Prague 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 1 of 34

  2. Content 01 03 Introduction Results Motivation BBOB Noiseless Intuition BBOB Large-scale Internal runtime 02 04 Main Components Summary Parameters & main equations Recent progress Parameter adaptation Goals moving forward Resource allocation Conclusions 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 2 of 34

  3. Motivation Deep Learning Growing need for • Physical Sciences optimization methods for very high-dimensional settings Optimization Algorithms Problems commonly • have 10^5- 10^8 optimizable variables [Devlin et al. 2019] Image from GOMC: https://gomc- wsu.github.io/Manual/index.html Image from: https://towardsdatascience.com/why- deep-learning-is-needed-over-traditional- machine-learning-1b6a99177063 12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 3 of 34

  4. Motivation Gradient-based optimization methods can create many difficulties Noise Vanishing gradients [Shalev-Shwartz et al. 2017] Image from: https://towardsdatascience.com/gradient-descent- algorithm-and-its-variants-10f652806a3 Architecture Getting stuck in Deep Design local minima Learning Image from [He et al. 2015]), Hyperparameter Current ways of Regularization tuning mitigating these issues Do not always work [Sutskever 2013] Image from: 12/7/2019 4 of 34 Srivastava, Nitish, et al. 2014 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed

  5. Motivation Interacting Particles Physical Sciences Image from GOMC: • Functions are non-convex https://gomc-wsu.github.io/Manual/index.html • Notoriously have large Protein Folding numbers of local minima [Nichita 2002] Image from: https://en.wikibooks.org/wiki/Structural_Bioch emistry/Proteins/Protein_Folding_Problem • Simulated annealing and quasi-Newton methods can be slow • Do not always converge to the global minima [Hao et al. 2015] Image by Thomas Splettstoesser: https://www.behance.net/gallery/10952399/Protein- Folding-Funnel 12/7/2019 5 of 34 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed

  6. Motivation Characteristics intentionally designed into the BBOB function testbeds Covariance matrices and Existing algorithms have Hessians limit their scalability been highly successful capabilities in these settings [BIPOP CMA-ES, Hansen 2009] Key components and operations are usually of order D^2 Images from: Finck, Hansen, Ros, Auger 2015 12/7/2019 6 of 34 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed

  7. Proposal Eliminate the use of D^2 objects and operations Adaptive Two Mode (ATM) Algorithm A black box optimization algorithm which only maintains objects and executes operations of order D 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 7 of 34

  8. The Adaptive Two Mode Algorithm Uses a combination of two kinds of search distributions / “modes” Exploitation Exploration Isotropic Directional distribution distribution • The two modes complement each other • ATM uses a set of rules to control the amplitudes and interactions between the modes 12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 8 of 34

  9. ATM Algorithm Regular Sample Best Sample Best Sample from last step Start from isotropic If sample leads distribution 1 2 to improvement: Suggest samples in that direction Repeat Once no more If new samples “good” samples also lead to are found: improvement: 4 3 Sample in same Start over with the direction at isotropic search exponentially (using an evolutionary increasing strategy) amplitude 12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 9 of 34

  10. Parameters of the Algorithm There are (currently) 8 parameters which play several roles in the ATM algorithm: • Controlling the growth factors of the modes: 2 > 𝛦𝑌 𝑛𝑗𝑜 2 𝑗𝑔 𝑌 𝑐𝑓𝑡𝑢𝑢 − 𝑌 𝑐𝑓𝑡𝑢𝑢−1 : 𝑒𝑝 𝑒 += 1, 𝑠 = 0 𝑓𝑚𝑡𝑓: 𝑒𝑝 𝑠 += 1, 𝑒 = 0 • Controlling the amplitudes of the modes 𝜌𝑠 , 𝜌 𝑆 = 𝑆 𝑛𝑏𝑦 exp 𝐻 𝑠 sin 𝑛𝑝𝑒 − 1 2 𝑈 2 𝑠 𝐸 = 𝑆 𝑛𝑏𝑦 exp 𝐻 𝑒 𝑒 − 𝐸 𝑒 𝑠 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 10 of 34

  11. Parameters of the Algorithm Controlling the search distribution in different axis: • 2 𝑷 − 𝑃 𝐻𝑐𝑓𝑡𝑢 𝑻 = 𝛾𝑻 + 1 − 𝛾 𝑛𝑓𝑏𝑜 𝒀 − 𝒀 𝐻𝑐𝑓𝑡𝑢𝑢 𝛽 𝑩 = 𝑻 + 𝛽 2 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 11 of 24

  12. Online Parameter Tuning Changing How to do this? Different + characteristics at functions • 4 intertwined parameter sets different stages • Parameter sets are optimized by another Two-Mode algorithm Need for online parameter tuning • Objective function designed to reflect the “ success ” at the task of minimizing the true objective function 𝛦𝑃 𝑄𝑐𝑓𝑡𝑢 = 𝐶𝑓𝑡𝑢 𝑑ℎ𝑏𝑜𝑕𝑓 𝑗𝑜 𝑢ℎ𝑓 𝑃 𝑄 = (𝑛𝑓𝑏𝑜 Δ𝑃 𝑄𝑐𝑓𝑡𝑢 + 𝑛𝑗𝑜 Δ𝑃 𝑄𝑐𝑓𝑡𝑢 ) 𝑢𝑠𝑣𝑓 𝑝𝑐𝑘𝑓𝑑𝑢𝑗𝑤𝑓 𝑔𝑣𝑜𝑑𝑢𝑗𝑝𝑜, 𝑔𝑝𝑣𝑜𝑒 𝑐𝑧 𝑢ℎ𝑓 𝑞𝑏𝑠𝑏𝑛𝑢𝑓𝑠 𝑡𝑓𝑢 2 12/7/2019 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 13 of 34

  13. Problem with Online Tuning Good chance New Changing for unsuitable parameter local search + sets sets space Fewer resources to “bad” parameter sets Proposal more resources to better ones Resources allocated to parameter set Performance of parameter set 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 14 of 34

  14. Parallel Optimization with Resource Allocation  Given a fixed number of samples 𝑂 𝑢𝑝𝑢 , distributed among 𝑛 parameter sets.  Change the allocation of samples to reflect their performance 𝑶 𝑢+1 = 𝑶 𝑢 − 𝐿 𝑁 −1 𝜠𝑷 𝑸𝒄𝒇𝒕𝒖 𝒖 − K 0 M −1 𝐎 t − 𝐎 0 𝑶 𝒖 = 𝑆𝑓𝑡𝑝𝑣𝑠𝑑𝑓 𝑏𝑚𝑚𝑝𝑑𝑏𝑢𝑗𝑝𝑜 𝑤𝑓𝑑𝑢𝑝𝑠 𝑏𝑢 𝑗𝑢𝑓𝑠𝑏𝑢𝑗𝑝𝑜 𝑢 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 15 of 34

  15. Parallel Optimization with Resource Allocation – Choice of Matrices 𝑶 𝑢+1 = 𝑶 𝑢 − 𝐿 𝑁 −1 𝜠𝑷 𝑸𝒄𝒇𝒕𝒖 𝒖 − K 0 M −1 𝐎 t − 𝐎 0 𝑛 − 1 ∗ 𝐿 −𝐿 ⋯ −𝐿 −𝐿 𝑛 − 1 ∗ 𝐿 ⋯ −𝐿 K = ⋮ ⋮ ⋱ ⋮ −𝐿 −𝐿 ⋯ 𝑛 − 1 ∗ 𝐿 𝐿 0 = 𝑙 0 𝐽 𝑁 = 𝜈 𝐽 • Conserves the total number of samples • Merit-based allocation system 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 16 of 34

  16. Information Flow Throughout ATM Components Resource allocation Repeat Parameter Set1 Parameter Parameter Parameter Set4 Set2 Set3 Suggestions Evaluate for samples Samples Values of objective function 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 20 of 34

  17. ATM Optimization Process Sum Of Different Powers - f14 Rotated Ellipse - f10 Sharp Ridge - f13 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 21 of 24

  18. Results on BBOB Testbed - Overview Succeeds at One of the best • Underperforms on solving: optimizers for the non-separable functions • 23/24 in 2D separable functions subset 8/24 in 40D • Especially if ill-conditioned • (f1-5) and/or noisy +Large budget Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 24 of 34 4/1/2020

  19. Results on BBOB Testbed - Successes • Very effective at optimizing separable functions • Capable at optimizing functions with “large” regions around the global minima which are convex (“large” = comparable to 𝑆 𝑛𝑏𝑦 ) 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 25 of 34

  20. Results on BBOB Testbed Underperformance Poor performance on • rotated and ill conditioned functions Poor performance • rotated and noisy/ multimodal functions 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 26 of 34

  21. Results from BBOB Largescale Budget = 3000D 4/1/2020 Benchmarking the ATM Algorithm on the BBOB 2009 Noiseless Function Testbed 27 of 34

Recommend


More recommend