Natural Computing Lecture 9: Evolutionary Strategies Michael Herrmann INFR09038 mherrman@inf.ed.ac.uk 22/10/2010 phone: 0131 6 517177 Informatics Forum 1.42
Evolutionary algorithms genotype mutation/ phenotype (encoding) crossover (applied to) Genetic strings of binary or e.g. 1-point optimization or algorithm integer numbers for either one search of optimal with p m , p c solutions Genetic trees (can be like GA plus computer programs programming represented as additional for a computational strings) operators problem Evolutionary real-valued mutation with parameters of a programming parameter vector self-adaptive computer program rates with fixed structure Evolution real-valued mutation with optimization or strategy encoding self-adaptive search of optimal rates solutions
Characteristics Suggesting the Use of GP 1.Discovering the size and shape of the solution 2.Reusing substructures 3.Discovering a set of useful of substructures 4.Discovering the nature of the hierarchical references among substructures 5.Passing parameters to a substructure 6.Discovering the type of substructures (e.g., subroutines, iterations, loops, recursions, or storage) 7.Discovering the number of arguments possessed by a substructure 8.Maintaining syntactic validity and locality by means of a developmental process 9.Discovering a general solution in the form of a parametrized topology containing free variables
Fundamental differences between GP and other approaches to AI and ML 1. Representation: Genetic programming overtly conducts its search for a solution to the given problem in program space. 2. Role of point-to-point transformations in the search: Genetic programming does not conduct its search by transforming a single point in the search space into another single point, but instead transforms a set of points into another set of points. 3. Role of hill climbing in the search: Genetic programming does not rely exclusively on greedy hill climbing to conduct its search, but instead allocates a certain number of trials, in a principled way, to choices that are appear to be inferior at a given stage. 4. Role of determinism in the search: Genetic programming conducts its search probabilistically. 5. Role of an explicit knowledge base: None (perhaps for initialisation). 6. Role of formal logic in the search: None (perhaps for editing) 7. Underpinnings of the technique: Biologically inspired.
Promising GP Application Areas Problem areas involving many variables that are interrelated in highly non-linear ways Inter-relationship of variables is not well understood A good approximate solution is satisfactory − design, control, classification and pattern recognition, data mining, system identification and forecasting Discovery of the size and shape of the solution is a major part of the problem Areas where humans find it difficult to write programs − parallel computers, cellular automata, multi-agent strategies / distributed AI, FPGAs "black art" problems − synthesis of topology and sizing of analog circuits, synthesis of topology and tuning of controllers, quantum computing circuits, synthesis of designs for antennas Areas where you simply have no idea how to program a solution, but where the objective (fitness measure) is clear Problem areas where large computerized databases are accumulating and computerized techniques are needed to analyze the data
Open Questions/Research Areas • Scaling up to more complex problems and larger programs • Using large function and terminal sets. • How well do the evolved programs generalise? • How can we evolve nicer programs? • size, efficiency, correctness • What sort of problems is GP good at / not-so- good at? • Convergence, optimality etc.? • Relation to human-based evolutionary processes (e.g. wikipedia)
Cross-Domain Features Native representations are sufficient when working with genetic programming Genetic programming breeds “simulatability” (Koza) Genetic programming starts small and controls bloat Genetic programming frequently exploits a simulator’s built-in assumption of reasonableness Genetic programming engineers around existing patents and creates novel designs more frequently than it creates infringing solutions
Overview 1. Introduction: History 2. The genetic code 3. The canonical genetic algorithm 4. Examples & Variants of GA 5. The schema theorem 6. The building block hypothesis 7. Hybrid algorithms 8. Multiobjective Optimization 9. Genetic Programming 10. Evolutionary strategies 11. Differential evolution
Evolution strategies Natural problem-dependent representation for search and optimisation (without “genetic” encoding) Individuals are vectors of real numbers which describe current solutions of the problem Recombination by exchange or averaging of components (but is sometimes not used) Mutation in continuous steps with adaptation of the mutation rate to account for different scales and correlations of the components Selection by fitness from various parent sets Elitism, islands, adaptation of parameters 1964: Ingo Rechenberg; Hans-Paul Schwefel
Multidimensional Mutations in ES Correlated mutations Uncorrelated mutation (scaled) Uncorrelated mutations Generation of offspring: y = x + N ( 0,C' ) x stands for the vector ( x 1 ,…,x n ) describing a parent C' is the covariance matrix C after mutation of the σ values where C=diag(σ, ..., σ) for uncorreleted mutations, C=diag(σ 1 , ..., σ n ) for scaled axes or C=(C ij ) for correlated mutations A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing. Evolution Strategies
Multidimensional Mutations in ES Off-spring vectors: x i :=m+z i , z i ~N(0,C) Select λ best, i.e. (1, λ ) - ES Correlations among successful offspring: T Z:=1/λ Σ z i z i Update correlations: C:=(1-ε) C+ε Z New state vector: m:=m+1/λ Σ z i Smoothes fitness fluctuations; or: m =best
Evolution strategies (μ , λ): selection of a set of λ children (μ + λ): selection from a set of μ parents and λ children (μ',λ'(μ,λ) γ ): isolate the children for γ generations where each time λ children are created (total population is λλ'). Then the best subpopulation is selected and becomes parents (e.g. λ=μ') for the new cycle of γ generations Analogous: (μ'+λ'(μ, λ) γ ), (μ'+λ'(μ+λ) γ ), (μ',λ'(μ+λ) γ ) Heuristic 1/5 rule: If less than 1/5 of the children are better than their parents then decrease size of mutations
Nested Evolution Strategy Hills are not independently distributed (hills of hills) Find a local maximum as a start state Generate 3 offspring populations (founder populations) that then evolve in isolation Local hill-climbing (if convergent: increase diversity of offspring populations) Select only highest population Walking process from peak to peak within an “ordered hill scenery” named Meta-Evolution Takes the role of crossover in GA http://www.bionik.tu-berlin.de/intseit2/xs2mulmo.html
ES: Conclusion A class of metaheuristic search algorithms Adaptive parameters important Relations to Gaussian adaptation Advanced ESs compare favourably to other metaheuristic algorithms (see www.lri.fr/~Hansen) Diversity of the population of solutions needs to be specifically considered See also www.scholarpedia.org/article/Evolution_strategies
Rainer Storn & Kenneth Price (1997) Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization 11: 341–359,
Differential Evolution
DE: Details Properties Simple, very fast Reasonably good results Diversity increases in flat regions (divergence property) Parameters NP=5D CR (4 … 10D) 1 − 2 CR=0.1 ≥ = F F (0 … 1.0) crit NP F=0.5 (0.4 …. 1.0) a proof exist that effectiveness requires
Search in Differential Evolution Rainer Storn (2008) Differential Evolution Research – Trends and Open Questions. Chapter 1 of Uday K. Chakraborty: Advances in Differential Evolution
Objective function used here:
DE with Crossover
Invariant representations Crossover depends on the coordinate directions and is thus not rotationally invariant Using randomly rotated coordinate systems the search becomes isotropic
DE with Jitter choose for each vector i and for each coordinate j a different random increment, e.g.:
DE: Variants Mutability and threshold parameters can also be evolved for each individual (as the step sizes in ES), i.e. dimension becomes D+2. Scheme for denoting DE variants: e.g. best/2 Also a number of self-adapting variants exist cf. [Storn, 08]
Meta-Heuristic Search µετα “beyond”, ευρισκειν "to find“ applied mainly to combinatorial optimization The user has to modify the algorithm to a greater or lesser extend in order to adapt it to specific problem These algorithms seem to defy the no-free lunch (NFL) theorem due to the combination of − biased choice of problems − user-generated modifications Can often be outperformed by a problem-dependent heuristic
Recommend
More recommend