m , , C ? 1. Adapting the mean m 2. Adapting the step-size 3. - PowerPoint PPT Presentation

How to update the di ff erent parameters m , σ , C ? 1. Adapting the mean m 2. Adapting the step-size σ 3. Adapting the covariance matrix C

Why Step-size Adaptation? Assume a (1+1)-ES algorithm with fi xed step-size (and σ n ∑ x 2 i = ∥ x ∥ 2 ) optimizing the function . C = I d f ( x ) = i =1 Initialize m , σ While (stopping criterion not met) sample new solution: What will happen if you x ← m + σ 𝒪 (0, I d ) look at the convergence if f ( x ) ≤ f ( m ) of f ( m )? m ← x

Why Step-size Adaptation? red curve: (1+1)-ES with optimal step-size (see later) σ = 10 − 3 green curve: (1+1)-ES with constant step-size ( )

Why Step-size Adaptation? We need step-size adaptation to approach the optimum fast (converge linearly) red curve: (1+1)-ES with optimal step-size (see later) σ = 10 − 3 green curve: (1+1)-ES with constant step-size ( )

Methods for Step-size Adaptation 1/5th success rule, typically applied with “+” selection [Rechenberg, 73][Schumer and Steiglitz, 78][Devroye, 72] [Schwefel, 81] -self adaptation, applied with “,” selection σ random variation is applied to the step-size and the better one, according to the objective function value, is selected path-length control or Cumulative step-size adaptation (CSA), applied with “,” selection [Ostermeier et al. 84][Hansen, Ostermeier, 2001] two-point adaptation (TPA), applied with “,” selection [Hansen 2008] test two solutions in the direction of the mean shift, increase or decrease accordingly the step-size

Step-size control: 1/5th Success Rule

Step-size control: 1/5th Success Rule probability of success per iteration: ps = #candidate solutions better than m #candidate solutions [ f ( x ) ≤ f ( m )]

(1+1)-ES with One- fi fth Success Rule - Convergence

Path Length Control - Cumulative Step-size Adaptation (CSA) step-size adaptation used in the -ES algorithm framework (in ( μ / μ w , λ ) CMA-ES in particular) Main Idea:

CSA-ES The Equations

Convergence of -CSA-ES ( μ / μ w , λ ) 2x11 runs

Convergence of -CSA-ES ( μ / μ w , λ ) σ 0 = 10 − 2 Note: initial step-size taken too small ( ) to illustrate the step-size adaptation

Convergence of -CSA-ES ( μ / μ w , λ )

Optimal Step-size - Lower-bound for Convergence Rates In the previous slides we have displayed some runs with “optimal” step-size. Optimal step-size relates to step-size proportional to the distance to σ t = σ ∥ x − x ⋆ ∥ x ⋆ the optimum: where is the optimum of the optimized function (with properly chosen). σ The associated algorithm is not a real algorithm (as it needs to know the distance to the optimum) but it gives bounds on convergence rates and allows to compute many important quantities. The goal for a step-size adaptive algorithm is to achieve convergence rates close to the one with optimal step-size

We will formalize this in the context of the (1+1)-ES. Similar results can be obtained for other algorithm frameworks.

Optimal Step-size - Bound on Convergence Rate - (1+1)-ES Consider a (1+1)-ES algorithm with any step-size adaptation mechanism: X t +1 = { X t + σ t 𝒪 t +1 if f ( X t + σ t 𝒪 t +1 ) ≤ f ( X t ) X t otherwise with i.i.d. { 𝒪 t , t ≥ 1} ∼ 𝒪 (0, I d ) equivalent writing: X t +1 = X t + σ t 𝒪 t +1 1 { f ( X t + σ t 𝒪 t +1 ) ≤ f ( X t )}

Bound on Convergence Rate - (1+1)-ES f : ℝ n → ℝ Theorem: For any objective function , for any y ⋆ ∈ ℝ n E [ ∥ X t +1 − y ⋆ ∥ ] ≥ E [ ∥ X t − y ⋆ ∥ ] − τ lower bound σ ∈ℝ > E [ln − ∥ e 1 + σ 𝒪∥ ] where with τ = max e 1 = (1,0,…,0) =: φ ( σ ) Theorem: The convergence rate lower-bound is reached on f ( x ) = g ( ∥ x − x ⋆ ∥ ) spherical functions (with strictly g : ℝ ≥ 0 → ℝ increasing) and step-size proportional to the distance to the σ t = σ opt ∥ x − x ⋆ ∥ optimum with such that . σ opt φ ( σ opt ) = τ

Log-Linear Convergence of scale-invariance step-size ES Theorem: The (1+1)-ES with step-size proportional to the distance to the optimum converges (log)-linearly σ t = σ ∥ x ∥ on the sphere function almost surely: f ( x ) = g ( ∥ x ∥ ) t ln ∥ X t ∥ 1 ∥ X 0 ∥ t →∞ − φ ( σ ) =: CR (1+1) ( σ )

Asymptotic Results ( n → ∞ )

m , , C ? 1. Adapting the mean m 2. Adapting the step-size 3. - PowerPoint PPT Presentation

How to update the di ff erent parameters m , , C ? 1. Adapting the mean m 2. Adapting the step-size 3. Adapting the covariance matrix C Why Step-size Adaptation? Assume a (1+1)-ES algorithm with fi xed step-size (and n x 2 i

Lecture 7: Processor Storage and Control The Final Destination PCWriteCond Control PCSource

8279 The display is controlled from an internal 16x8 RAM that RL 6 SL 2 RL 7 SL 1 stores the

EE 457 Unit 5 Single-Cycle CPU Datapath and Control 2 CPU Organization Scope We will build

CISC Design Hardware Flowchart Virendra Singh Associate Professor Computer Architecture and

IPRAN grid-ring topologies - Problem statement for Fast IGP Convergence algorium Susan Hares

Introduction to NLP Diyi Yang Some slides borrowed from Yulia Tsvetkov at CMU and Noah Smith at

rtcweb-ietf-ip-handling-01 Justin Uberti Refresher: Goals Prevent drive-by address harvesting,

On the Correlation between Perceptual and Contextual Aspects of Laughter in Meetings Kornel

Interpersonal Problem Solving Faryaneh Poursardar Virginia Tech Slides based on the Effective

Taking the machine seriously. A study of mechanized mathematics Liesbeth De Mol Centre for

CS 349: User Interfaces https://www.student.cs.uwaterloo.ca/~cs349 Gustavo Fortes Tondello

never done jalewis@thoughtworks.com @boicy 1 never done 2 never done Incomplete adjective

Converse of Smith Theory Min Yan Hong Kong University of Science and Technology International

CSE 311: Foundations of Computing Lecture 2: More Logic, Equivalence & Digital Circuits If

Jiwei Li, NLP Researcher By Pragya Arora & Piyush Ghai Introduction Graduated from

Debugging Large Scale Parallel Applications Filippo Gioachin Parallel Programming Laboratory

Fast Correlation Attacks and Linear Codes Lauri Tarkkala November 25, 2004 1

Methods for the Reconstruction of Parallel Turbo Codes M. Cluzeau, M. Finiasz, and J.-P. Tillich

Optimum MDS convolutional codes over GF(2 m ) and their relation to the trace functi n ngela

Overcoming Delay, Synchronization and Cyclic Paths Meir Feder Tel Aviv University joint work

Sem Semanti tic c segm segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Ta Task d

Lecture 04 Reliable Communication I-Hsiang Wang ihwang@ntu.edu.tw National Taiwan University

Spa$ally Coupled Turbo-Like Codes: Convolu$onal Codes on Graphs

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

m , , C ? 1. Adapting the mean m 2. Adapting the step-size 3. - PowerPoint PPT Presentation

How to update the di ff erent parameters m , , C ? 1. Adapting the mean m 2. Adapting the step-size 3. Adapting the covariance matrix C Why Step-size Adaptation? Assume a (1+1)-ES algorithm with fi xed step-size (and n x 2 i

Lecture 7: Processor Storage and Control The Final Destination PCWriteCond Control PCSource

8279 The display is controlled from an internal 16x8 RAM that RL 6 SL 2 RL 7 SL 1 stores the

EE 457 Unit 5 Single-Cycle CPU Datapath and Control 2 CPU Organization Scope We will build

CISC Design Hardware Flowchart Virendra Singh Associate Professor Computer Architecture and

IPRAN grid-ring topologies - Problem statement for Fast IGP Convergence algorium Susan Hares

Introduction to NLP Diyi Yang Some slides borrowed from Yulia Tsvetkov at CMU and Noah Smith at

rtcweb-ietf-ip-handling-01 Justin Uberti Refresher: Goals Prevent drive-by address harvesting,

On the Correlation between Perceptual and Contextual Aspects of Laughter in Meetings Kornel

Interpersonal Problem Solving Faryaneh Poursardar Virginia Tech Slides based on the Effective

Taking the machine seriously. A study of mechanized mathematics Liesbeth De Mol Centre for

CS 349: User Interfaces https://www.student.cs.uwaterloo.ca/~cs349 Gustavo Fortes Tondello

never done jalewis@thoughtworks.com @boicy 1 never done 2 never done Incomplete adjective

Converse of Smith Theory Min Yan Hong Kong University of Science and Technology International

CSE 311: Foundations of Computing Lecture 2: More Logic, Equivalence &amp; Digital Circuits If

Jiwei Li, NLP Researcher By Pragya Arora &amp; Piyush Ghai Introduction Graduated from

Debugging Large Scale Parallel Applications Filippo Gioachin Parallel Programming Laboratory

Fast Correlation Attacks and Linear Codes Lauri Tarkkala November 25, 2004 1

Methods for the Reconstruction of Parallel Turbo Codes M. Cluzeau, M. Finiasz, and J.-P. Tillich

Optimum MDS convolutional codes over GF(2 m ) and their relation to the trace functi n ngela

Overcoming Delay, Synchronization and Cyclic Paths Meir Feder Tel Aviv University joint work

Sem Semanti tic c segm segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Ta Task d

Lecture 04 Reliable Communication I-Hsiang Wang ihwang@ntu.edu.tw National Taiwan University

Spa$ally Coupled Turbo-Like Codes: Convolu$onal Codes on Graphs

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

CSE 311: Foundations of Computing Lecture 2: More Logic, Equivalence & Digital Circuits If

Jiwei Li, NLP Researcher By Pragya Arora & Piyush Ghai Introduction Graduated from