Runtime Analysis of Convex Evolutionary Search Convex Evolutionary Search Alberto Moraglio & Dirk Sudholt University of Birmingham & University of Sheffield
Research Goal • Aim: identify matches between “topographic features” of fitness landscapes and “behavioural features” of evolutionary algorithms that alone explain/lead to good performance • Features: general, representation-independent • Features: general, representation-independent • Performance: optimisation in polynomial time • Potential Benefits: – understanding the fundamental causes of good performance – general run-time analysis results for a class of algorithms on a class of landscapes
Abstract Convex Evolutionary Search Evolutionary Search
Example of Geometric Crossover • Geometric crossover: offspring are in the segment between parents. B 1 0 0 1 1 1 1 0 0 1 1 X X A A 2 B 1 1 0 1 1 3 A X 0 1 0 1 1 H(A,X) + H(X,B) = H(A,B)
Abstract Convex Evolutionary Search It holds across representations for any EA with crossover & selection 5
Abstract Concave Landscape • NFL: averaged over all fitness landscapes convex search performs as random search. On what landscapes does it work better than random search? search? • Rephrased: what topographic feature of the landscape is a good match for the convex behavioural feature of the search? • Intuition says: (approximately) concave landscapes
Concave Fitness Landscapes Concave landscapes can be defined in a representation-independent way 7
Generalised Concave Landscapes – Traditional notion does generalise to combinatorial spaces (but caution needed!) – Average concave landscapes: for all x, y: z~Unif([x,y]), E[f(z)]>=(f(x)+f(y))/2 e.g., OneMax is average affine e.g., OneMax is average affine – Quasi concave landscapes: for all x, y: z in [x,y], f(z)>=min(f(x),f(y)) e.g., LeadingOnes is quasi concave – Adding a e-bounded perturbation function, we obtain approximated concave landscapes: E[f(z)]>=(f(x)+f(y))/2 – e and f(z)>=min(f(x),f(y)) - e
Theorem [Foga 2011] • On (average/quasi) concave landscapes, convex evolutionary search produces steady improvements: the fitness of the next population is never less than the (average/worst) fitness of the current population (even without selection). the current population (even without selection). • This result degrades graciously as the landscape becomes less concave (for increasing e). • This is a one-step result: it does not imply convergence nor good performance. 9
Research question • Is a general run-time analysis of evolutionary algorithms on concave landscapes across representations possible? • Does convex search on concave landscapes have exponentially better run-time than random exponentially better run-time than random search? • Refinement needed: – Algorithm – Landscape – Performance
Algorithm, Landscape & Performance Performance
Abstract Convex Search Algorithm • Initialise Population Uniformly at Random • Until Population has converged to the same individual – Rank individuals on fitness – If there are at least two fitness values in the population, remove all individuals with the worst fitness – Apply k times Convex Hull Uniform Recombination to the remaining individuals to create the next population • Return individual in the last population • Parameter: population size k This algorithm is formally well-defined for any metric and representation.
Binary Convex Hull Recombination • The Specific Convexity on Binary Strings can be obtained by plugging in the Hamming distance on the general notions of Abstract Convexity • Convex Sets � Schemata • Convex Hull � Smallest Schema Matching a set of Convex Hull � Smallest Schema Matching a set of Binary Strings Binary Strings • Uniform Convex Hull Recombination � At each position: – If all parents have 1 or 0, the offspring has 1 or 0 respectively – If there is at least a 1 or at least a 0, the offspring has 1 or 0 with probability 0.5
Abstract Quasi-Concave Landscape (Properties) • A landscape f is quasi-concave iff for all x,y and z in [x,y]: f(z) >= min(f(x),f(y)) • If f is quasi-concave: for all {x_i} and z in co({x_i}): f(z)>= min{f(x_i)} co({x_i}): f(z)>= min{f(x_i)} • Level Set L_a: {all x in S: f(x) >= a} • A landscape f is quasi-concave iff all level sets are convex sets • A landscape f is quasi-concave iff it is a “Tower of Hanoi” of convex sets
Polynomial Quasi-Concave Landscape • All fitness levels are convex sets • The number q of fitness levels is polynomial in n (problem size, n = log(|S|)) • The rate between areas of successive fitness The rate between areas of successive fitness levels |FL(i+1)|/|FL(i)|is 1/poly(n) • Parameters: q and r=min(|FL(i+1)|/|FL(i)|) • Example: LeadingOne is poly QC landscape, Needle is QC landscape but not poly QC
Performance • The algorithm does not converge to the optimum in all runs • We are interested in: – An upper-bound of the runtime when it converges (RT) – A lower-bound of the probability of convergence (PC) – A lower-bound of the probability of convergence (PC) • Multi-restart version: – Repeat convex search until the optimum is first met – Expected run-time: RT/PC • Performance as a function of: n, k, q, r, and of the underlying space (S,d)
Pure Adaptive Search
Pure Adaptive Search • Pick a initial point X_0 uniformly at random. • Generate X_(i+1) uniformly at random on the level set S_i ={x: x in S and f(x)>= f(X_i)} (improving set). (improving set). • If optimum found stop. Otherwise repeat from previous step.
PAS remarks • Studied from the ’80 in the field of Global Optimisation (mostly on continuous domains). • It is an ideal algorithm, in general not implementable efficiently. • As PRS, the performance of PAS does not depend on • As PRS, the performance of PAS does not depend on the structure of S but only on the distribution of f. • On almost all functions it is exponentially better than Pure Random Search. • The result above holds also for relaxations of PAS which are closer to implementable algorithms, e.g., Hesitant Adaptive Search.
PRS vs. PAS (on poly QC landscapes) • L_0 <= L_1 <= … <= L_q • The shape of the level sets does not matter • HittingProb(PRS) = • HittingProb(PRS) = Pr(L_0)*Pr(L_1|L_0)*…*Pr(L_q|L_(q-1)) = r^q • r= 1/poly(n), q=poly(n) � r^q = 1/exp(n) � RT(PRS) = 1/r^q = exp(n) • RT(PAS) = 1/Pr(L_0) + 1/Pr(L_1) + … +1/Pr(L_q) • RT(PAS) = q * 1/r = poly(n) * poly(n) = poly(n)
Runtime of Convex Search (Sketch) (Sketch)
RT of Convex Search on Poly QC Landscapes • Initial Population: k points unif at random on S (L_0) • Effect of Selection: k*r points unif at random on L_1 • Effect of Recombination: – when co(sel(Pop))=L_1, (i.e., the convex hull of k*r points sampled at random in L_1 covers L_1) sampled at random in L_1 covers L_1) – k offspring unif at random on L_1 • And so forth • The worst individual in the population conquers a new fitness level at each iteration because selection increase the fitness of one level and recombination on concave landscapes keeps the minimum fitness of the parents. So RT= q * k.
Success Probability Each iteration can be seen as an iteration of PAS: k points are sampled • uniformly at random in the improving set (w.r.t. worst individual in the population) We assumed that in a typical run: • – 1. (at least) the expected number of points (k*r) are selected – 2. the convex hull of selected points sampled at random in L_i covers L_i. For continuous spaces event 2 has probability 0. For combinatorial For continuous spaces event 2 has probability 0. For combinatorial • spaces this event has positive (and decent) probability. E.g., for the Hamming space the worst case probability of covering any • convex set by m points sampled unif is the covering probability for the entire Hamming space. This happens when for each dimension a position of m binary strings has at least a zero and at least a one. Success probability: probability that the events 1 and 2 occur at each • generation (q times). For population k large enough, we get a good probability of success • (e.g., > 0.5). This requires 2 restarts to get to the optimum.
Result Specialisation • The only space-dependent parameter in this reasoning is the covering prob. (pr. of event 2) • We derived an expression of the success probability as a function of the covering prob. probability as a function of the covering prob. valid for any space • We can determine the population size k for the pair quasi-concave landscape & convex search when specialised to a new space as soon as we know the covering probability for that space
Result Specialisation • Boolean Space & Hamming distance: – with k ≥ 4 log(2(q + 2)n)/r Num Gen = 2q – on polynomial QC landscapes q and 1/r are poly(n) -> k and Num Gen are poly(n) Num Gen are poly(n) – for LeadingOnes: q=n, r=1/2 -> RT = n log n (better than any unary unbiased BB algorithm) • Integer Vectors with Hamming distance and Manhattan distance (as a function of the cardinality of the alphabet) – Easy to determine covering probability for product spaces
Conclusions
Recommend
More recommend