Differential Evolution Entirely Parallel Method K.N. Kozlov ∗ 1 , A.M. Samsonov 2 and M.G. Samsonova 1 1 St.Petersburg Polytechnic University, St.Petersburg 195251, Russia 2 A.F. Ioffe Physico-Technical Institute, St.Petersburg, 194021, Russia ∗ email:kozlov_kn@spbstu.ru
Introduction Motivation: The design of efficient algorithms and systems to solve the inverse problem of mathematical modeling continues to be a challenge. Applications in biology and medicine: Universal solution doesn’t exist. The main difficulties are: processing of large data volumes, ■ determination of experimentally heterogeneity of biomedical data ■ ■ immeasurable quantities, and parameters, design of new substances, multiple fitness criteria, ■ ■ interpretation of modeling results. high computational complexity of ■ ■ biomedical applications, Aims: An efficient optimization method should not only be fast and scalable, but also reliable and robust. 1. Applicability to the different types of models and the formal description of the a priori knowledge, 2. Flexibility: Wide range of strategies to overcome local extrema, 3. Usability: independent of the language of model description. 2 / 30
Formal Problem Statement Initial value problem: Let the vector v ( t, q ) = ( v 0 ( t, q ) , ..., v K − 1 ( t, q )) T describe the state of the system at time t . The dimension of v equals the number of state variables K . Let the vector q = ( q 0 , ..., q I − 1 ) T be the vector of parameters with dimension I . The system of ordinary differential equations of the first order in respect to the independent variable t and initial conditions describe the dynamics of the system: ∂v ∂t = f ( v, q ); v (0 , q ) = v 0 ; (1) The problems of mathematical physics described by the differential equations of higher orders can be rewritten in normal form, like (1)(Lurie, 1993). The model parameters are to be estimated by adaptation or fitting the model output to the experimental data. 3 / 30
Objective Function Adaptation is performed by minimization of the objective function, e.g. the sum of squared differences between the data and model output, J ( v ( t i , q ) − y ( t i )) T ( v ( t i , q ) − y ( t i )) ,where J independent experimental observations � i =1 are denoted y ( t i ) = ( y 0 ( t ) , ..., y K − 1 ( t )) T and i = 1 , ..., J . Multi-objective case: Weighted global criterion method (weighted sum method). Let R i ( X ) represent the optimization criteria and S j ( X ) = 0 be the parameter constraints. � � F ( q ) = α i R i ( q ) + β j S j ( q ) → min i j α i > 0 β j > 0 4 / 30
Box Constraints Constraints in the form of inequalities can be imposed for the subset of parameters: ≤ q i ≤ q up q low i ∈ I l . (2) i i Lets introduce new parameters u i : q i = α i + β i tanh ( ηu i ) , (3) where scaling coefficient η is chosen experimentally, and α i = ( q up β i = ( q up + q low − q low ) / 2; ) / 2 . i i i i Consequently, parameters q i , i ∈ I l in (1) are substituted with there transformations and adaptation procedure is applied to determine unconstrained parameters u i . 5 / 30
Overview The constraint optimization problem can be transformed to the unconstrained problem using penalty functions. Global methods : Local search : Simulated Annealing (Metropolis Gradient descent, ■ ■ method), Conjugate gradient, ■ Genetic algorithms, evolutionary ■ Nelder-Mead method, ■ algorithms, Imitation methods: Particle ■ Swarm Optimization, Ant Colonies, For more overview see (Ashyraliyev et al., 2009; Mendes and Kell, 1998; Moles et al., 2003). 6 / 30
Existing Solutions Methods for solution of the inverse problem of mathematical modeling are actively developed. Available tools and approaches: Disadvantages: Matlab, Mathematica, Maple, etc, Integrated systems: poor ■ ■ interoperability, oriented to Octave, R, COPASI, KNIME, etc, ■ simulations, Programming «from scratch», ■ Limitations to certain types of ■ Adaptation «by hand». ■ models, Steep learning curve, ■ Complexity of the a priori ■ knowledge formalization. 7 / 30
Differential Evolution Stochastic iterative optimization technique, suggested by Storn and Price (Storn and Price, 1995). Geometric interpretation in Construction of trial vectors: 2D case: v = q r 1 + S ( q r 2 − q r 3 ) , where S is the NP parameter vectors from current generation q x 2 scaling constant. Minimum q r3 x x x q i x «trigonometric mutation rule» x x q r2 x x (Fan and Lampinen, 2003): x q r1 x x x x v o z = q r 1 + q r 2 + q r 3 +( ϕ 2 − ϕ 1 )( q r 1 − q r 2 ) 3 +( ϕ 3 − ϕ 2 )( q r 2 − q r 3 ) +( ϕ 1 − ϕ 3 )( q r 3 − q r 1 ) q 1 where ϕ i = | F ( q ri ) | /ϕ ∗ , i = 1 , 2 , 3 , The set random parameter vectors q i , i = 1 , ..., NP . The ϕ ∗ = | F ( q r 1 ) | + | F ( q r 2 ) | + | F ( q r 3 ) | size of the set NP is fixed. 8 / 30
Crossover v w z The offspring is defined as follows: q 0 q 0 q 0 q 1 q 1 q 1 n = 2 � q 2 q 2 q 2 v j , j = � n � I , � n + 1 � I , ..., � n + L − 1 � I n = 3 w j = q 3 q 3 q 3 n = 4 z j j < � n � I OR j > � n + L − 1 � I q 4 q 4 q 4 q q q 5 5 5 q 6 q 6 q 6 q 7 q 7 q 7 q 8 q 8 q 8 where n is the randomly chosen index, � n � I is the reminder of division of n by I , L is determined by Pr ( L = a ) = ( p ) a , where p is the crossover probability. 9 / 30
Differential Evolution Entirely Parallel Method Being the evolutionary algorithm, DE can be effectively parallelized: each member of the population is pushed to the asynchronous queue. ■ it is evaluated individually as soon as there is an available thread in the pool. ■ threads are synchronized on each generation. ■ Age of the individual – the number of iterations the individual survived without changes. Several oldest members are substituted by the same number of individuals, having the best values of objective function. The parameters of the algorithm: - Ψ is the number of seeding individuals (i.e. individuals that substitute old ones), - Θ is the number of iterations called seeding interval. 10 / 30
Selection of Offsprings proc select (individual) = New Selection Rule for several { objective functions : if ( F < the value of the parent) then Accept offspring 1. The offspring replaces its parent else if the value for the offspring set for all additional functions P do of parameters is less than that for if ( P < the value of the parent) then the parental one. Generate the random number U . if ( U < control parameter for this 2. The offspring replaces its parent function) then if the value of some objective Accept offspring function is better and the end if randomly selected value is less end if than the predefined parameter for end for this function. end if } 11 / 30
Population Diversity The original algorithm was highly dependent on internal parameters, see, e.g. (Gaemperle R., 2002). The variance of each parameter: � 2 NP − 1 NP − 1 � 1 1 � � var j = q i,j − q k,j (4) NP NP i =0 k =0 where j = 0 , ..., I − 1 . An efficient Adaptive Scheme for selection of internal parameters S and p based on the Preserving Population Diversity was proposed in (Zaharie, 2002). 12 / 30
Adaptive Scheme Then �� NP · ( c j − 1)+ p j (2 − p j ) NP · ( c j − 1) + p j (2 − p j ) ≥ 0 2 · NP · p j (5) S j = S inf NP · ( c j − 1) + p j (2 − p j ) < 0 and � � j − 1) 2 − NP · (1 − c j ) − ( NP · S 2 ( NP · S 2 j − 1) + c j ≥ 1 p j = (6) p inf c j < 1 New control parameter γ : c new var j /var new � � = γ (7) j j 13 / 30
Stopping criterion F max M ρ F opt N Calculations are stopped in case that the objective function F decreases less than a predefined value ρ during M steps. 14 / 30
The integer valued parameters The values are sorted in ascending order. ■ The index of the parameter in the floating point array becomes the value of the ■ parameter in the integer array. a[0] = 0.3; a[1] = 0.5; a[2] = 0.1; a[3] = 0.8; After sorting in the ascending order the array should be transformed to: b[0] = a[2] = 0.1 b[2] = a[1] = 0.5 b[1] = a[0] = 0.3 b[3] = a[3] = 0.8 The indices of the sorted array define the current set of the integer valued parameters: i[0] = 2; i[1] = 0; i[2] = 1; i[3] = 3; 15 / 30
Approximation of the Optimum The plot of the criteria in 6 4.5 Criterion 1 Criterion 1 Criterion 2 Criterion 2 4 the close vicinity of the 5 Criterion 3 Criterion 3 3.5 optimal values of the four 4 3 parameters. 2.5 Error Error 3 2 F 1 is the sum of squared 2 1.5 x=5.0 1 y=3.0 differences between the 1 0.5 model output and data, 0 0 4.6 4.8 5 5.2 5.4 5.6 5.8 6 2.4 2.6 2.8 3 3.2 3.4 3.6 F 2 and F 3 penalize the Parameter Value Parameter Value 4.5 6 Criterion 1 Criterion 1 deviation of the model from Criterion 2 Criterion 2 4 5 Criterion 3 Criterion 3 the desired behavior. 3.5 4 3 2.5 Error Error While the criteria do not 3 2 take a minimum value at 2 1.5 one and the same point, 1 q=0.43 1 z=1.12 nevertheless, the algorithm 0.5 0 0 produces reliable 1 1.05 1.1 1.15 1.2 1.25 1.3 0.36 0.38 0.4 0.42 0.44 0.46 0.48 Parameter Value Parameter Value approximation of the optimum. 16 / 30
Recommend
More recommend