some comments on gd and igd and relations to the
play

Some Comments on GD and IGD and Relations to the Hausdorff Distance - PowerPoint PPT Presentation

1 Some Comments on GD and IGD and Relations to the Hausdorff Distance O. Schtze, X. Esquivel, A. Lara , C. Coello CINVESTAV-IPN Centro de Investigacin y de Estudios Avanzados del Instituto Politcnico Nacional. Mexico City, Mexico O.


  1. 1 Some Comments on GD and IGD and Relations to the Hausdorff Distance O. Schütze, X. Esquivel, A. Lara , C. Coello CINVESTAV-IPN Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional. Mexico City, Mexico O. Schütze

  2. 2 Outline Introduction and Background • Trade off for the design of indicators for the evaluation of MOEAs • Metric / Hausdorff distance Investigation of the Indicators • GD • IGD A ‘New’ Indicator • Metric properties • Extension to continuous models O. Schütze

  3. 3 Multi-Objective Optimization Multi-Objective Optimization Problem f 1 ,f 2 ⎧ ⊂ → : n f Q R R 1 ⎪ = min � (MOP) ⎨ F ⎪ ⊂ → : n ⎩ f Q R R k x Pareto set f 2 P Q = set of optimal solutions ( Pareto set ) F(P Q ) = the image of P Q ( Pareto front ) Pareto front First we consider discrete (or discretized) models, i.e., |Q|< ∞ . f 1 O. Schütze

  4. 4 Outliers in Stochastic Search Algorithms Example : Consider the MOP ( ε ,x 2 ) → : [ 0 , 1 ] n k F R ⎛ ⎞ x ⎜ ⎟ = 1 ( ) F x ⎜ ⎟ ( ) ⎝ ⎠ g x where g:[0,1] n � R k-1 ( � Okabe, ZDT). Assume a point x=( ε ,z), z ∈ [0,1] n-1 , is a member of the archive/population. Further, assume that new candidate solutions are chosen uniformly at random from the domain. Then the probability to find a point that dominates x is less than ε ( � objective 1). The distance of x to P Q can be ‘large’. O. Schütze

  5. 5 Example P hypothetical Pareto front X 1 perfect approximation of P, except one outlier X 2 none of the elements are ‘near’ to P Question: Which approxomation is ‘better’? Extreme situations: -- pessimistic view (Hausdorff distance): d H (X 1 ,P)=9, d H (X 2 ,P)=2.83 -- averaged result (Generational distance): GD(X 1 ,P)=0.81 , GD(X 2 ,P)=2.83 O. Schütze

  6. 6 Outlier Trade Off Trade off for the indicator D when measuring results of MOEAs (the design of MOEAs is influenced by D): Use of a Metric Averaging the Results + greedy search = shortest + Single outliers do not have path to the set of interest a mayor influence on the ( � triangle inequality) result -- Penalization of single -- The greedy search is not outliers of the candidate set neccessarily the shortest path to the set of interest O. Schütze

  7. 7 Metric Definition : Suppose X is a set and d:X × X � R is a function. Then d is called a metric on X if, and only if, for each a,b,c ∈ X: ≥ = ⇔ = ( ) ( , ) 0 and ( , ) 0 (Positive Property) a d a b d a b a b = ( ) ( , ) ( , ) (Symmetric Property) b d a b d b a ≤ + (Triangle Inequality) ( ) ( , ) ( , ) ( , ) c d a c d a b d a c Variants: -- d is called a semi-metric if properties (a) and (b) are satisfied -- A pseudo-metric is a semi-metric that satisfies the relaxed triangle inequality: ≤ σ + σ ≥ ( , ) ( ( , ) ( , )), 1 d a c d a b d a c O. Schütze

  8. 8 Hausdorff Distance Definition : Let u,v ∈ R n , A,B ⊂ R n , and ||.|| be a vector norm. A The Hausdorff distance d H is defined as follows: = − ( ) ( , ) : inf a dist u A u v u ∈ v A = ( ) ( , ) : sup ( , ) b dist B A dist u A ∈ A u B = ( ) ( , ) : max( ( , ), ( , )) c d A B dist A B dist B A B H Remarks: (i) dist(A,B) is not symmetric: if B is a proper subset of A, then it is dist(B,A) =0 and dist(A,B) >0. (ii) d H is a metric on the set of discrete sets. It can also be used for continuous spaces. In that case it is d H (A,B)=0 ⇔ clos(A)=clos(B) O. Schütze

  9. 9 Discussion of GD (1) GD as proposed by Van Veldhuizen applied on general finite sets X, Y ⊂ R k using dist : 1 / p ⎛ ⎞ | | 1 X ∑ = ⎜ ⎟ ( , ) ( , ) p GD X Y dist x i Y ⎜ ⎟ ⎝ ⎠ X = 1 i Metric properties : -- positive property: NO it is GD(X,Y)=0 ⇔ X ⊂ Y (X can be a proper subset of Y (*)) -- symmetric property: NO (*): then GD(X,Y)=0 but GD(Y,X)>0 -- triangle inequality: NO ( � next slide) O. Schütze

  10. 10 Discussion of GD (2) 1.) Normalization strategy of GD: Let A 1 ={a} with dist(F(a),F(P Q ))=1, i.e., GD(F(A 1 ),F(P Q ))=1 Now let A n be the multiset consisting of n copies of a, A n ={a,…,a}, then ( 1 ,.., 1 ) T p n = = → ( ( ), ( )) p 0 GD F A F P n Q n n 2.) Investigate (relaxed) triangle inequality: let X,Z ⊂ R k s.t. GD(X,Z)>0. Let rhs(Y):= GD(X,Y)+GD(Y,Z) and define Y n := X ∪ {y 1 ,y 2 ,…,y n } such that Σ i dist(y i ,Z) < ∞ . Then GD(X,Y)=0 and GD(Y,Z) � 0 for n � ∞ � GD does not satisfy and relaxed triangle inequality since rhs(y) � 0. Note : for p>1, any set {y 1 ,..,y n } ⊂ F(Q) (if compact) can be taken!! O. Schütze

  11. 11 New Variant of GD Nearby modification: take the power mean of the distances: 1 / p 1 / ⎛ ⎞ p ⎛ ⎞ | | | | 1 1 X X ∑ ∑ ⎜ ⎟ = = ⎜ ⎟ ( , ) ( , ) ( , ) p p GD X Y dist x Y dist x Y ⎜ ⎟ ⎜ ⎟ p i i ⎝ ⎠ X ⎝ ⎠ p X = = 1 1 i i -- same (poor) metric properties, but -- better averaging: GD p (F(A n ),F(P Q ))=1 for all n ∈ N -- (needed for the upcoming indicator) O. Schütze

  12. 12 Discussion IGD IGD as proposed by Coello & Cruz applied on general finite sets X, Y ⊂ R k using dist : 1 / p ⎛ ⎞ | | 1 Y ∑ = ⎜ ⎟ ( , ) ( , ) p IGD X Y dist y i X ⎜ ⎟ ⎝ ⎠ Y = 1 i -- same metric properties as GD since IGD(A,B) = GD(B,A) -- same modification: take power mean of the distances: 1 / p 1 / ⎛ ⎞ p ⎛ ⎞ | | | | 1 1 Y Y ∑ ∑ ⎟ ⎜ = = ⎜ ⎟ ( , ) ( , ) ( , ) p p IGD X Y dist y X ⎜ dist y X ⎟ ⎜ ⎟ p i i ⎝ ⎠ Y ⎝ ⎠ p Y = = 1 1 i i O. Schütze

  13. 13 A “New” Indicator Observation: GD(X,Y) is an ‘averaged version’ of dist(X,Y), same for IGD � combine GD and IGD as for d H : ( ) Δ = ( , ) max ( , ), ( , ) X Y GD X Y IGD X Y p p p Proposition 1: ∆ p is a semi-metric for 1 ≤ p< ∞ and a metric for p= ∞ Remark: for p= ∞ the indicator ∆ p coincides with d H Proposition 2: let |X|,|Y|,|Z| ≤ N, then Δ ≤ Δ + Δ ( , ) p ( ( , ) ( , )) X Z N X Y Y Z p p p O. Schütze

  14. 14 Interpretation of p for the Trade Off Table : Percentage of the triangle violations ( σ =1) for different values of p. Hereby, we have taken 100,000 different sets A,B,C with |A|,|B|,|C|=N, k=2, each entry randomly chosen within [0,1]. p= ∞ p=1 p=2 p=5 p=10 N=2 0.541 0.15 0.026 0.008 0 N=4 0.249 0.06 0.019 0.009 0 N=6 0.105 0.033 0.008 0.003 0 N=10 0.02 0.004 0.002 0.001 0 N=100 0 0 0 0 0 � The larger the value of p, the ´nearer´ Δ p is to a metric (but: how to choose p? what is the influence of N?) O. Schütze

  15. 15 Example P hypothetical Pareto front X 1 perfect approximation of P, except one outlier X 2 none of the elements are ‘near’ to P Question: Which approxomation is ‘better’? p= ∞ p=1 p=2 p=5 p=10 ∆ p (P,X1) 4.047 5.571 9 0.8182 2.714 ∆ p (P,X2) 2.828 2.828 2.828 2.828 2.828 O. Schütze

  16. 16 Extension to Continuous Models f 2 Now consider continuous models γ → 2 [ , ] m M R M 2 1 1 In general: k objectives � P Q (k-1)-dimensional GD p : A finite, P Q compact m 2 � GD turns to a continuous SOP f 1 m 1 M 1 IGD p : P Q continuous � the power mean of IGD p turns into an integral. Example: k=2, F(P Q ) connected, then 1 / p ⎛ ⎞ 1 ∫ M ⎜ ⎟ = γ 1 ( ( ), ( )) ( ( ), ( )) p IGD F A F P dist t F A dt ⎜ ⎟ − Q ⎝ ⎠ M m m 1 1 1 O. Schütze

  17. 17 Discretization of F(P Q ) Task : P Q given analytically, compute an approximation Y of F(P Q ) with d H (Y,F(P Q ))< δ (a priori defined approximation quality) For k=2 : use continuation-like methods: select step size t such that ||F(x+tv)-F(x)|| ∞ ≈Θδ , Θ <1 a safety factor (selection of t based on Lipschitz estimations) 1.2 1.2 OKA2 PF PF 1 1 0.8 0.8 0.6 0.6 f 2 f 2 0.4 0.4 0.2 0.2 0 0 −0.2 −0.2 −4 −3 −2 −1 0 1 2 3 4 −4 −3 −2 −1 0 1 2 3 4 f 1 f 1 δ =0.01 δ =0.4 O. Schütze

  18. 18 Numerical Example 5 pop1 pop2 Y = F(PQ) pop3 4.5 pop4 pop5 Yi=F(popi) pop6 4 pop7 Pareto Front 3.5 ∆ 2 (Y1,Y)=3.03 ∆ 2 (Y2,Y)=2.71 3 ∆ 2 (Y3,Y)=1.43 2.5 ∆ 2 (Y4,Y)=0.77 2 ∆ 2 (Y5,Y)=0.31 1.5 ∆ 2 (Y6,Y)=0.12 1 ∆ 2 (Y7,Y)=0.007 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 NSGA-II applied on ZDT1 O. Schütze

  19. 19 Discussion Conclusions • New indicator ∆ p proposed for the evaluation of MOEAs. • ∆ p is a semi-metric, and a pseudo-metric for bounded archive sizes • p can (in principle) be used to handle the ‘outlier trade off’ Open Questions • How to choose p? • How to measure the distance to a metric? • How to adapt the selection mechanisms in order to improve ∆ p ? ( ∆ p is NOT compliant with the dominance relation!) O. Schütze

  20. 20 Thank you for your attention! O. Schütze

Recommend


More recommend