tutorial theory of rash made easy
play

Tutorial: Theory of RaSH Made Easy Benjamin Doerr - PowerPoint PPT Presentation

Tutorial: Theory of RaSH Made Easy Benjamin Doerr Max-Planck-Institut fr Informatik Saarbrcken Outline: Some Theory of RaSH Part I: Drift Analysis Motivation: Explains daily life A simple and powerful drift theorem 4


  1. Tutorial: Theory of RaSH Made Easy Benjamin Doerr Max-Planck-Institut für Informatik Saarbrücken

  2. Outline: Some Theory of RaSH � Part I: Drift Analysis – Motivation: Explains daily life – A simple and powerful drift theorem – 4 Applications � Coupon collector � RLS and (1+1) EA optimize OneMax � RLS and (1+1) EA optimize linear functions � Finding miminum spanning trees – Summary & Outlook � Part II: Random walk arguments [if time permits] Benjamin Doerr

  3. Drift Analysis: Motivation � Life in the Saarland is easy... – Get salary on day 0: M 0 = 1000 € (6 559.57 ₣ ) – Day 1: Spend half of it in the pub: M 1 = ½ M 0 = 500 – Day 2: Spend half of your money: M 2 = ½ M 1 = 250 – … – Day t: Spend half of your money: M t = ½ M t-1 – Question: When are you broke (M T < 1)? – Answer: T = ⌊ log 2 (M 0 ) + 1 ⌋ = 10 Benjamin Doerr

  4. Drift Analysis: Motivation +randomness � Life in the Saarland is easy... and chaotic – Get salary on day 0: M 0 = 1000 € (6 559.57 ₣ ) – Day 1: Expect to spend half of it: E(M 1 ) = ½ M 0 = 500 – Day 2: Expect to spend half of your money: E(M 2 ) = ½ M 1 – … – Day t: Expect to spend half of your money: E(M t ) = ½ M t-1 – Question: When do you expect to be broke? Truth: 10.95 is possible – Ideal answer: E(T) = ⌊ log 2 (M 0 ) + 1 ⌋ = 10 – Warnung: You hope for E(min{T|M T <1}) = min{T|E(M T )<1} = 10 – Solution: Drift-Theorem (next slide) E(M t ) = (1/2) t M 0 Benjamin Doerr

  5. Drift Analysis: The Theorem � A ‘new’ drift theorem (BD, Leslie Goldberg, Daniel Johannsen): ��� X � , X � , . . . �� ������ ��������� ������ ������ �� { � } ∪ �� , ∞ � � ��� δ > � �� ���� ���� ��� ��� t ∈ � � E � X t | X t − � � x � ≤ �� − δ � x � ��� X � ����� x � ���� ����������� ���� ��� T � ��� { t | X t � � } � ���� ��� E � T � ≤ � δ ��� x � � �� � ���� ��� ��� c > � ��� n ∈ � � ��� T > � δ ��� x � � c �� n �� ≤ n − c � � Some history: – Doob (1953), Tweedie (1976), Hajek (1982): Fundamental work, mathematical. – Early EA works (‘Dortmund’ 1995- ): Use direct methods, coupon collector, Chernoff bounds, ... [could have been done with drift] – Expected weight decrease method: ‘Drift-thinking’, but technical effort necess- ary to cope with not using drift analysis [should have been done with drift] – He&Yao (2001-04): First explicit use of drift analysis in EA theory. – Now: Many drift theorems and applications [BD: the above is the coolest ☺ ] Benjamin Doerr

  6. Drift Analysis: 4 Applications � A ‘new’ drift theorem (BD, Leslie Goldberg, Daniel Johannsen): ��� X � , X � , . . . �� ������ ��������� ������ ������ �� { � } ∪ �� , ∞ � � ��� δ > � �� ���� ���� ��� ��� t ∈ � � E � X t | X t − � � x � ≤ �� − δ � x � ��� X � ����� x � ���� ����������� ���� ��� T � ��� { t | X t � � } � ���� ��� E � T � ≤ � δ ��� x � � �� � ���� ��� ��� c > � ��� n ∈ � � ��� T > � δ ��� x � � c �� n �� ≤ n − c � � 4 Applications: – Coupon collector – OneMax – Linear functions – Minimum spanning trees � Making the Expected Weight Decrease Method obsolete Benjamin Doerr

  7. Application 1: Coupon Collector � Coupon Collector Problem: – There are n different types of coupons: T 1 , …, T n – Round 0: You start with no coupon – Each round t, you obtain a random coupon C t � Pr(C t = T k ) = 1/n for all t and k – After how many rounds do you have all [types of] coupons? � Analysis: – X t := Number of missing coupon types after round t – X 0 = n. Question: Smallest T such that X T = 0. – It X t-1 = x, then the chance to get a new coupon in round t is x/n. Hence E(X t ) = x – x/n = (1 – 1/n) x. [ δ = 1/n] Best – Drift-Thm gives: • E(T) ≤ (1/ δ )(ln x 0 + 1) = n (ln(n)+1) possible • For all c>0, Pr(T > (c+1) n ln(n)) < n -c Benjamin Doerr

  8. Application 2: RLS optimizes OneMax � One of the most simple randomized search heuristics (RaSH): Randomized Local Search (RLS), here used to maximize f: {0,1} n → R RLS: 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. Pick k ∈ {1, …, n} uniformly at random 3. y := x; y k := 1 – x k % mutation: flip a random bit 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate � Question: How long does it take to find the maximum of a simple function like OneMax = f: {0,1} n → R; x ↦ x 1 + x 2 + … + x n (number of ‘ ones ’ in x) � Remark: Of course, x = (1, 1, … , 1) is the maximum, and no-one needs an algorithm to find this out. Aim: Start understanding RaSH via simple examples Benjamin Doerr

  9. Application 2: RLS optimizes OneMax � RLS: x 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. Pick k ∈ {1, …, n} uniformly at random 3. y := x; y k := 1 – x k % mutation: flip a random bit 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate � Question: How long does it take to find the maximum of a simple function like OneMax = f: {0,1} n → R; x ↦ x 1 + x 2 + … + x n (number of ‘ ones ’ in x) � Analysis (same as for coupon collector): – X t : Number of zeroes after iteration t (= “ f opt – f(x) ” ). Trivially, X 0 ≤ n – If X t-1 = k, then with probability k/n, we flip a ‘ zero ’ into a ‘ one ’ (X t = k – 1). Otherwise, y is worse than x and thus X t = k – Hence, E(X t ) = k – k/n = (1 – 1/n) k – Drift Thm gives: Maximum found after n (ln n +1) iterations (in expect.) Benjamin Doerr

  10. Application 2a: (1+1)-EA optimizes OneMax � One of the most simple evolutionary algorithms (EAs): (1+1)-EA, again used to maximize f: {0,1} n → R (1+1)-EA: 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. y := x 3. For each i ∈ {1, …, n} do % mutation: Flip each bit w.p. 1/n with probability 1/n set y i := 1 – x i 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate � ‘(1+1)’: population size = 1, generate 1 off-spring, perform ‘plus’-selection: choose new population from parents and off-springs � Cannot get stuck in local optima (“always converges”). Question: Time to maximize OneMax = f: {0,1} n → R; x ↦ x 1 + … + x n ? � Benjamin Doerr

  11. Application 2a: (1+1)-EA optimizes OneMax � X (1+1)-EA: 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. y := x 3. For each i ∈ {1, …, n} do % mutation: Flip each bit w.p. 1/n with probability 1/n set y i := 1 – x i 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate Question: Time to maximize OneMax = f: {0,1} n → R; x ↦ x 1 + … + x n ? � � Analysis: – X t : Number of zeroes after iteration t. – If X t-1 = k, then the probability that exactly one of the missing bits is flipped, is (1 – 1/n) n-1 (1/n) k ≥ (1/e) (k/n). – Hence, E(X t ) ≤ (k – 1)(k/en) + k(1 – k/en) = k (1 – 1/en) – Drift Thm: Expected optimization time at most en(ln n + 1) Benjamin Doerr

  12. A 3: RLS optimizes Linear Functions � x RLS: 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. Pick i ∈ {1, …, n} uniformly at random 3. y := x; y i := 1 – x i % mutation: flip a random bit 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate � Question: How long does it take to find the maximum of an arbitrary linear function f: {0,1} n → R; x ↦ a 1 x 1 + a 2 x 2 + … + a n x n (wlog 0<a 1 ≤a 2 ≤ … ≤a n ) � Analysis (same as for OneMax): – X t : Number of zeroes after iteration t. Trivially, X 0 ≤ n – If X t-1 = k, then with probability k/n, we flip a ‘ zero ’ into a ‘ one ’ (X t = k – 1). Otherwise, y is worse than x and thus X t = k – Message: You can use X t different from “ f opt – f(x t ) ” ! Drift Thm: E(T) ≤ (1/ δ )(ln X 0 +1), – Why not X t = “ f opt – f(x t ) ” ? and X 0 can be large! Benjamin Doerr

  13. A 3a: (1+1)-EA optimizes Linear Functions � x (1+1)-EA: 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. y := x 3. For each i ∈ {1, …, n} do % mutation: Flip each bit w.p. 1/n with probability 1/n set y i := 1 – x i 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate Maximize f: {0,1} n → R; x ↦ a 1 x 1 + a 2 x 2 + … + a n x n � (wlog 0<a 1 ≤a 2 ≤ … ≤a n ) ! � Classical difficult problem – Droste, Jansen, Wegener (2002): Exp. opt. time E(T) = O(n log n) – He, Yao (2001-04): E(T) = O(n log n) via drift analysis – J ä gersk ü pper (2008): E(T) ≲ 2.02 e n ln(n) via average drift analysis – D., Johannsen, Winzen (2010): e n ln(n) ≲ E(T) ≲ 1.39 e n ln(n) – D., Goldberg (2010 + ): O(n log n) whp for any c/n mutation probability Benjamin Doerr

Recommend


More recommend