introduction to global optimization
play

Introduction to Global Optimization Fabio Schoen 2008 - PowerPoint PPT Presentation

Introduction to Global Optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Introduction to Global Optimization p. Global Optimization Problems x S R n f ( x ) min What is it meant by global optimization? Of course


  1. Let ¯ x best known solution. Let x ) = { x ∈ Ω : c T x ≤ c T ¯ D (¯ x } If D (¯ x ) ⊆ C then ¯ x is optimal; Check: a polytope P (with known vertices) is built which contains D (¯ x ) If all vertices of P are in C ⇒ optimal solution. Otherwise let v : best feasible vertex; the intersection of the segment [0 , v ] with ∂C (if feasible) is an improving point x . Otherwise a cut is introduced in P which is tangent to Ω in x . Introduction to Global Optimization – p. 2

  2. x ) = { x ∈ Ω : c T x ≤ c T ¯ D (¯ x } 4 c T x = 0 3 2 C 1 Ω 0 -1 x ¯ -2 -3 -4 Introduction to Global Optimization – p. 2 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2

  3. Initialization Given a feasible solution ¯ x , take a polytope P such that P ⊇ D (¯ x ) i.e. y : c T y ≤ c T ¯ x y feasible ⇒ y ∈ P If P ⊂ C , i.e. if y ∈ P ⇒ h ( y ) ≤ 0 then ¯ x is optimal. Checking is easy if we know the vertices of P . Introduction to Global Optimization – p. 2

  4. x ) ⊆ P with vertices V 1 , . . . , V k . V ⋆ := arg max h ( V j ) P : D (¯ 4 c T x = 0 3 2 C 1 Ω 0 -1 x ¯ -2 -3 V ⋆ -4 Introduction to Global Optimization – p. 2 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2

  5. Step 1 Let V ⋆ the vertex with largest h () value. Surely h ( V ⋆ ) > 0 (otherwise we stop with an optimal solution) Moreover: h (0) < 0 (0 is in the interior of C ). Thus the line from V ⋆ to 0 must intersect the boundary of C Let x k be the intersection point. It might be feasible ( ⇒ improving) or not. Introduction to Global Optimization – p. 2

  6. x k = ∂C ∩ [ V ⋆ , 0] 4 c T x = 0 3 2 C 1 Ω 0 -1 x k x ¯ -2 -3 V ⋆ -4 Introduction to Global Optimization – p. 3 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2

  7. If x k ∈ Ω , set ¯ x := x k 4 c T x = 0 3 2 C 1 Ω 0 -1 x ¯ -2 -3 -4 Introduction to Global Optimization – p. 3 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2

  8. Otherwise if x k �∈ Ω , the polytope is divided 4 c T x = 0 3 2 C 1 Ω 0 -1 -2 -3 -4 Introduction to Global Optimization – p. 3 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2

  9. Otherwise if x k �∈ Ω , the polytope is divided 4 c T x = 0 3 2 C 1 Ω 0 -1 -2 -3 -4 Introduction to Global Optimization – p. 3 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2

  10. Duality for d.c. problems min x ∈ S g ( x ) − h ( x ) where f, g : convex. Let h ⋆ ( u ) := sup { u T x − h ( x ) : x ∈ R n } g ⋆ ( u ) := sup { u T x − g ( x ) : x ∈ R n } the conjugate functions of h e g . The problem inf { h ⋆ ( u ) − g ⋆ ( u ) : u : h ⋆ ( u ) < + ∞} is the Fenchel-Rockafellar dual. If min g ( x ) − h ( x ) admits an optimum, then Fenchel dual is a strong dual. Introduction to Global Optimization – p. 3

  11. If x ⋆ ∈ arg min g ( x ) − h ( x ) then u ⋆ ∈ ∂h ( x ⋆ ) ( ∂ denotes subdifferential) is dual optimal and if u ⋆ ∈ arg min h ⋆ ( u ) − g ⋆ ( u ) then x ⋆ ∈ ∂g ⋆ ( u ⋆ ) is an optimal primal solution. Introduction to Global Optimization – p. 3

  12. A primal/dual algorithm P k : min g ( x ) − ( h ( x k ) + ( x − x k ) T y k ) and D k : min h ⋆ ( y ) − ( g ⋆ ( y k − 1 ) + x T k ( y − y k − 1 ) Introduction to Global Optimization – p. 3

  13. Exact Global Optimization Introduction to Global Optimization – p. 3

  14. GlobOpt - relaxations Consider the global optimization problem (P): min f ( x ) x ∈ X and assume the min exists and is finite and that we can use a relaxation (R): min g ( y ) y ∈ Y Usually both X and Y are subsets of the same space R n . Recall: (R) is a relaxation of (P) iff: X ⊆ Y g ( x ) ≤ f ( x ) for all x ∈ X Introduction to Global Optimization – p. 3

  15. Branch and Bound 1. Solve the relaxation (R) and let L be the (global) optimum value (assume it is feasible for (R)) 2. (Heuristically) solve the original problem (P) (or, more generally, find a “good” feasible solution to (P) in X ). Let U be the best feasible function value known 3. if U − L ≤ ε then stop: U is a certified ε –optimum for (P) 4. otherwise split X and Y into two parts and apply to each of them the same method Introduction to Global Optimization – p. 3

  16. Tools “good relaxations”: easy yet accurate good upper bounding, i.e., good heuristics for (P) Good relaxations can be obtained, e.g., through: convex relaxations domain reduction Introduction to Global Optimization – p. 3

  17. Convex relaxations Assume X is convex and Y = X . If g is the convex envelop of f on X , then solving the convex relaxation (R), in one step gives the certified global optimum for (P). g ( x ) is a convex under-estimator of f on X if: g ( x ) is convex g ( x ) ≤ f ( x ) ∀ x ∈ X g is the convex envelop of f on X if: g is a convex under-estimator of f g ( x ) ≥ h ( x ) ∀ x ∈ X ∀ h : convex under-estimator of f Introduction to Global Optimization – p. 4

  18. A 1-D example Introduction to Global Optimization – p. 4

  19. Convex under-estimator Introduction to Global Optimization – p. 4

  20. Branching Introduction to Global Optimization – p. 4

  21. Bounding Upper bound fathomed lower bounds Introduction to Global Optimization – p. 4

  22. Relaxation of the feasible domain Let min x ∈ S f ( x ) be a GlobOpt problem where f is convex, while S is non convex. A relaxation (outer approximation) is obtained replacing S with a larger set Q . If Q is convex ⇒ convex optimization problem. If the optimal solution to min x ∈ Q f ( x ) belongs to S ⇒ optimal solution to the original problem. Introduction to Global Optimization – p. 4

  23. Example x ∈ [0 , 5] ,y ∈ [0 , 3] − x − 2 y min xy ≤ 3 4 3 2 1 0 0 1 2 3 4 5 6 Introduction to Global Optimization – p. 4

  24. Relaxation x ∈ [0 , 5] ,y ∈ [0 , 3] − x − 2 y min xy ≤ 3 We know that: ( x + y ) 2 = x 2 + y 2 + 2 xy thus xy = (( x + y ) 2 − x 2 − y 2 ) / 2 and, as x and y are non-negative, x 2 ≤ 5 x , y 2 ≤ 3 y , thus a (convex) relaxation of xy ≤ 3 is ( x + y ) 2 − 5 x − 3 y ≤ 6 Introduction to Global Optimization – p. 4

  25. Relaxation 4 3 2 1 0 0 1 2 3 4 5 6 Optimal solution of the relaxed convex problem: (2 , 3) (value: − 8 ) Introduction to Global Optimization – p. 4

  26. Stronger Relaxation x ∈ [0 , 5] ,y ∈ [0 , 3] − x − 2 y min xy ≤ 3 Thus: (5 − x )(3 − y ) ≥ 0 ⇒ 15 − 3 x − 5 y + xy ≥ 0 ⇒ xy ≥ 3 x + 5 y − 15 Thus a (convex) relaxation of xy ≤ 3 is 3 x + 5 y − 15 ≤ 3 i.e.: 3 x + 5 y ≤ 18 Introduction to Global Optimization – p. 4

  27. Relaxation 4 3 2 1 0 0 1 2 3 4 5 6 The optimal solution of the convex (linear) relaxation is (1 , 3) which is feasible ⇒ optimal for the original problem Introduction to Global Optimization – p. 5

  28. Convex (concave) envelopes How to build convex envelopes of a function or how to relax a non convex constraint? Convex envelopes ⇒ lower bounds Convex envelopes of − f ( x ) ⇒ upper bounds Constraint: g ( x ) ≤ 0 ⇒ if h ( x ) is a convex underestimator of g then h ( x ) ≤ 0 is a convex relaxations. Constraint: g ( x ) ≥ 0 ⇒ if h ( x ) is concave and h ( x ) ≥ g ( x ) , then h ( x ) ≥ 0 is a “convex” constraint Introduction to Global Optimization – p. 5

  29. Convex envelopes Definition: a function is polyhedral if it is the pointwise maximum of a finite number of linear functions. (NB: in general, the convex envelope is the pointwise supremum of affine minorants) The generating set X of a function f over a convex set P is the set X = { x ∈ R n : ( x, f ( x )) is a vertex of epi ( conv P ( f )) } I.e., given f we first build its convex envelop in P and then define its epigraph { ( x, y ) : x ∈ P, y ≥ f ( x ) } . This is a convex set whose extreme points can be denoted by V . X are the x coordinates of V Introduction to Global Optimization – p. 5

  30. Generating sets * * * * Introduction to Global Optimization – p. 5

  31. b b b Introduction to Global Optimization – p. 5

  32. Characterization Let f ( x ) be continuously differentiable in a polytope P . The convex envelope of f on P is polyhedral if and only if X ( f ) = Vert ( P ) (the generating set is the vertex set of P ) Corollary: let f 1 , . . . , f m ∈ C 1 ( P ) and � i f i ( x ) possess polyhedral convex envelopes on P . Then � � Conv ( f i ( x )) = Conv f i ( x ) i i iff the generating set of � i Conv ( f i ( x )) is Vert ( P ) Introduction to Global Optimization – p. 5

  33. Characterization If a f ( x ) is such that Conv f ( x ) is polyhedral, than an affine function h ( x ) such that 1. h ( x ) ≤ f ( x ) for all x ∈ Vert ( P ) 2. there exist n + 1 affinely independent vertices of P , V 1 , . . . , V n +1 such that f ( V i ) = h ( V i ) i = 1 , . . . , n + 1 belongs to the polyhedral description of Conv f ( x ) and h ( x ) = conv f ( x ) for any x ∈ Conv ( V 1 , . . . , V n +1 ) . Introduction to Global Optimization – p. 5

  34. Characterization The condition may be reversed: given m affine functions h 1 , . . . , h m such that, for each of them 1. h j ( x ) ≤ f ( x ) for all x ∈ Vert ( P ) 2. there exist n + 1 affinely independent vertices of P , V 1 , . . . , V n +1 such that f ( V i ) = h j ( V i ) i = 1 , . . . , n + 1 Then the function ψ ( x ) = max j φ j ( x ) is the convex envelope of a polyhedral function f iff the generating set of ψ is Vert (P) for every vertex V i we have ψ ( V i ) = f ( V i ) Introduction to Global Optimization – p. 5

  35. Sufficient condition If f ( x ) is lower semi- continuous in P and for all x �∈ Vert ( P ) there exists a line ℓ x : x ∈ interior of P ∩ ℓ x and f ( x ) is concave in a neighborhood of x on ℓ x , then Conv f ( x ) is polyhedral Application: let � f ( x ) = α ij x i x j i,j The sufficient condition holds for f in [0 , 1] n ⇒ bilinear forms are polyhedral in an hypercube Introduction to Global Optimization – p. 5

  36. Application: a bilinear term (Al-Khayyal, Falk (1983)): let x ∈ [ ℓ x , u x ] , y ∈ [ ℓ y , u y ] . Then the convex envelope of xy in [ ℓ x , u x ] × [ ℓ y , u y is φ ( x, y ) = max { ℓ y x + ℓ x y − ℓ x ℓ y ; u y x + u x y − u x u y } In fact: φ ( x, y ) is a under-estimate of xy : ( x − ℓ x )( y − ℓ y ) ≥ 0 xy ≥ ℓ y x + ℓ x y − ℓ x ℓ y and analogously for xy ≥ u y x + u x y − u x u y Introduction to Global Optimization – p. 5

  37. Bilinear terms xy ≥ φ ( x, y ) = max { ℓ y x + ℓ x y − ℓ x ℓ y ; u y x + u x y − u x u y } No other (polyhedral) function underestimating xy is tighter. In fact ℓ y x + ℓ x y − ℓ x ℓ y belongs to the convex envelope: it underestimates xy and coincides with xy at 3 vertices ( ( ℓ x , ℓ y ) , ( ℓ x , u y ) , ( u x , ℓ y ) ). Analogously for the other affine function. All vertices are interpolated by these 2 underestimating hyperplanes ⇒ they form the convex envelop of xy Introduction to Global Optimization – p. 6

  38. All easy then? Of course no! Many things can go wrong . . . It is true that, on the hypercube, a bilinear form: � α ij x i x j i<j is polyhedral (easy to see) but we cannot guarantee in general that the generating set of the envelope are the vertices of the hypercube! (in particular, if α ’s have opposite signs) if the set is not an hypercube, even a bilinear term might be non polyhedral: e.g. xy on the triangle { 0 ≤ x ≤ y ≤ 1 } Finding the (polyhedral) convex envelope of a bilinear form on a generic polytope P is NP–hard! Introduction to Global Optimization – p. 6

  39. Fractional terms A convex underestimate of a fractional term x/y over a box can be obtained through w ≥ ℓ x /y + x/u y − ℓ x /u y if ℓ x ≥ 0 w ≥ x/u y − ℓ x y/ℓ y u y + ℓ x /ℓ y if ℓ x < 0 w ≥ u x /y + x/ℓ y − u x /ℓ y if ℓ x ≥ 0 w ≥ x/ℓ y − u x y/ℓ y u y + u x /u y if ℓ x < 0 (a better underestimate exists) Introduction to Global Optimization – p. 6

  40. Univariate concave terms If f ( x ) , x ∈ [ ℓ x , u x ] , is concave, then the convex envelope is simply its linear interpolation at the extremes of the interval: f ( ℓ x ) + f ( u x ) − f ( ℓ x ) ( x − ℓ x ) u x − ℓ x Introduction to Global Optimization – p. 6

  41. Underestimating a general nonconvex fun Let f ( x ) ∈ C 2 be general non convex. Than a convex underestimate on a box can be defined as n � φ ( x ) = f ( x ) − α i ( x i − ℓ i )( u i − x i ) i =1 where α i > 0 are parameters. The Hessian of φ is ∇ 2 φ ( x ) = ∇ 2 f ( x ) + 2 diag ( α ) φ is convex iff ∇ 2 φ ( x ) is positive semi-definite. Introduction to Global Optimization – p. 6

  42. How to choose α i ’s? One possibility: uniform choice: α i = α . In this case convexity of φ is obtained iff � 0 , − 1 � α ≥ max 2 min x ∈ [ ℓ,u ] λ min ( x ) where λ min ( x ) is the minimum eigenvalue of ∇ 2 f ( x ) Introduction to Global Optimization – p. 6

  43. Key properties φ ( x ) ≤ f ( x ) φ interpolates f at all vertices of [ ℓ, u ] φ is convex Maximum separation: max( f ( x ) − φ ( x )) = 1 � ( u i − ℓ i ) 2 4 α i Thus the error in underestimation decreases when the box is split. Introduction to Global Optimization – p. 6

  44. Estimation of α Compute an interval Hessian [ H ] : [ H ( x )] ij = [ h L ij ( x ) , h U ij ( x )] in [ ℓ, u ] Find α such that [ H ] + 2 diag ( α ) � 0 . Gerschgorin theorem for real matrices: � � � λ min ≥ min h ii − | h ij | i j � = i Extension to interval matrices: � � ij |} u j − ℓ j � h L max {| h L ij | , | h U λ min ≥ min ii − u i − ℓ i i j � = i Introduction to Global Optimization – p. 6

  45. Improvements new relaxation functions (other than quadratic). Example n � (1 − e γ i ( x i − ℓ i ) )(1 − e γ i ( u i − x i ) ) Φ( x ; γ ) = − i =1 gives a tighter underestimate than the quadratic function partitioning: partition the domain into a small number of regions (hyper-rectangules); evaluate a convex underestimator in each region; join the underestimators to form a single convex function in the whole domain Introduction to Global Optimization – p. 6

  46. Domain (range) reduction Techniques for cutting the feasible region without cutting the global optimum solution. Simplest approaches: feasibility-based and optimality-based range reduction (RR). Let the problem be: min x ∈ S f ( x ) Feasibility based RR asks for solving ℓ i = min x i u i = max x i x ∈ S x ∈ S for all i ∈ 1 , . . . , n and then adding the constraints x ∈ [ ℓ, u ] to the problem (or to the sub-problems generated during Branch & Bound) Introduction to Global Optimization – p. 6

  47. Feasibility Based RR If S is a polyhedron, RR requires the solution of LP’s: [ ℓ ¯  , u ¯  ] = min / max x ¯  Ax ≤ b x ∈ [ L, U ] . based RR: from every constraint � “Poor man’s” L.P j a ij x j ≤ b i in which a i ¯  > 0 then � �  ≤ 1 � x ¯ b i − a ij x j ⇒ a i ¯  j � =¯  � �  ≤ 1 � x ¯ b i − min { a ij L j , a ij U j } a i ¯  j � =¯  Introduction to Global Optimization – p. 7

  48. Optimality Based RR Given an incumbent solution ¯ x ∈ S , ranges are updated by solving the sequence: ℓ i = min x i u i = max x i f ( x ) ≤ f (¯ x ) f ( x ) ≤ f (¯ x ) x ∈ S x ∈ S where f ( x ) is a convex underestimate of f in the current domain. RR can be applied iteratively (i.e., at the end of a complete RR sequence, we might start a new one using the new bounds) Introduction to Global Optimization – p. 7

  49. generalization min x ∈ X f ( x ) ( P ) g ( x ) ≤ 0 a (non convex) problem; let min X f ( x ) ( R ) x ∈ ¯ g ( x ) ≤ 0 be a convex relaxation of ( P ) : { x ∈ X : g ( x ) ≤ 0 } ⊆ { x ∈ ¯ X : g ( x ) ≤ 0 } and x ∈ X : g ( x ) ≤ 0 ⇒ f ( x ) ≤ f ( x ) Introduction to Global Optimization – p. 7

  50. R.H.S. perturbation Let φ ( y ) = min X f ( x ) ( R y ) x ∈ ¯ g ( x ) ≤ y be a perturbation of ( R ) . ( R ) convex ⇒ ( R y ) convex for any y . Let ¯ x : an optimal solution of ( R ) and assume that the i –th constraint is active: g (¯ x ) = 0 Then, if ¯ x y is an optimal solution of ( R y ) ⇒ g i ( x ) ≤ y i is active at x y if y i ≤ 0 ¯ Introduction to Global Optimization – p. 7

  51. Duality Assume ( R ) has a finite optimum at ¯ x with value φ (0) and Lagrange multipliers µ . Then the hyperplane H ( y ) = φ (0) − µ T y is a supporting hyperplane of the graph of φ ( y ) at y = 0 , i.e. φ ( y ) ≥ φ (0) − µ T y ∀ y ∈ R m Introduction to Global Optimization – p. 7

  52. Main result If ( R ) is convex with optimum value φ (0) , constraint i is active at the optimum and the Lagrange multiplier is µ i > 0 then, if U is an upper bound for the original problem ( P ) the constraint: g i ( x ) ≥ − ( U − L ) /µ i (where L = φ (0) ) is valid for the original problem ( P ) , i.e. it does not exclude any feasible solution with value better than U . Introduction to Global Optimization – p. 7

  53. proof Problem ( R y ) can be seen as a convex relaxation of the perturbed non convex problem Φ( y ) = min x ∈ X f ( x ) g ( x ) ≤ y and thus φ ( y ) ≤ Φ( y ) . Thus underestimating ( R y ) produces an underestimate of Φ( y ) . Let y := e i y i ; From duality: L − µ T e i y i ≤ φ ( e i y i ) ≤ Φ( e i y i ) If y i < 0 then U is an upper bound also for Φ( e i y i ) , thus L − µ i y i ≤ U . But if y i < 0 then constraint i is active. For any feasible x there exists a y i < 0 such that g ( x ) ≤ y i is active ⇒ we may substitute y i with g i ( x ) and deduce L − µ i g i ( x ) ≤ U Introduction to Global Optimization – p. 7

  54. Applications Range reduction: let x ∈ [ ℓ, u ] in the convex relaxed problem. If variable x i is at its upper bound in the optimal solution, them we can deduce x i ≥ max { ℓ i , u i − ( U − L ) /λ i } where λ i is the optimal multiplier associated to the i –th upper bound. Analogously for active lower bounds: x i ≤ min { u i , ℓ i + ( U − L ) /λ i } Introduction to Global Optimization – p. 7

  55. Let the constraint a T i x ≤ b i be active in an optimal solution of the convex relaxation ( R ) . Then we can deduce the valid inequality a i Tx ≥ b i − ( U − L ) /µ i Introduction to Global Optimization – p. 7

  56. Methods based on “merit functions” Bayesian algorithm: the objective function is considered as a realization of a stochastic process f ( x ) = F ( x ; ω ) A loss function is defined, e.g.: L ( x 1 , ..., x n ; ω ) = min i =1 ,n F ( x i ; ω ) − min x F ( x ; ω ) and the next point to sample is placed in order to minimize the expected loss (or risk) x n +1 = arg min E ( L ( x 1 , ..., x n , x n +1 ) | x 1 , ..., x n ) = arg min E (min( F ( x n +1 ; ω ) − F ( x ; ω )) | x 1 , ..., x n ) Introduction to Global Optimization – p. 7

  57. Radial basis method Given k observations ( x 1 , f 1 ) , . . . , ( x k , f k ) , an interpolant is built: n � s ( x ) = λ i Φ( � x − x i � ) + p ( x ) i =1 p : polynomial of a (prefixed) small degree m . Φ : radial function like, e.g.: Φ( r ) = r linear r 3 Φ( r ) = cubic r 2 log r Φ( r ) = thin plate spline e − γr 2 Φ( r ) = gaussian Polynomial p is necessary to guarantee existence of a unique interpolant (i.e. when the matrix { Φ ij = Φ( � x i − x j � ) } is singular) Introduction to Global Optimization – p. 8

  58. “Bumpiness” Let f ⋆ k an estimate of the value of the global optimum after k observations. Let s y k the (unique) interpolant of the data points ( x i , f i ) i = 1 , . . . , k ( y, f ⋆ k ) Idea: the most likely location of y is such that the resulting interpolant has minimum “bumpiness” Bumpiness measure: σ ( s k ) = ( − 1) m +1 � λ i s y k ( x i ) Introduction to Global Optimization – p. 8

  59. TO BE DONE Introduction to Global Optimization – p. 8

  60. Stochastic methods Pure Random Search - random uniform sampling over the feasible region Best start: like Pure Random Search, but a local search is started from the best observation Multistart: Local searches started from randomly generated starting points Introduction to Global Optimization – p. 8

  61. 3 2 + + 1 + + + + + r s r s s r r s r s s r s r r s r s r s 0 + + -1 + -2 -3 0 3 5 1 2 4 Introduction to Global Optimization – p. 8

  62. 3 2 + + 1 + + + + + r s r s s r r s r s s r s r r s r s r s 0 + + -1 + -2 -3 0 3 5 1 2 4 Introduction to Global Optimization – p. 8

  63. Clustering methods Given a uniform sample, evaluate the objective function Sample Transformation (or concentration): either a fraction of “worst” points are discarded, or a few steps of a gradient method are performed Remaining points are clustered from the best point in each cluster a single local search is started Introduction to Global Optimization – p. 8

  64. Uniform sample s r 5 s r r s s r s r − 3 4 s r s r r s s r s r r s − 5 3 r s r s 0 2 s r s r s r s r r s s r r s r s s r s r s r s r r s − 1 1 r s s r s r s r s r 0 0 1 2 3 4 5 Introduction to Global Optimization – p. 8

  65. Sample concentration s r 5 s r s r r s s r − 3 4 s r s r r s s r s r r s − 5 3 s r s r 0 2 s r + + + + r s + + + + + + + − 1 1 + + + + + 0 0 1 2 3 4 5 Introduction to Global Optimization – p. 8

  66. Clustering r 5 r r r r − 3 4 r r r r r − 5 3 u u r 0 2 r r − 1 1 0 0 1 2 3 4 5 Introduction to Global Optimization – p. 8

  67. Local optimization r 5 r r r r − 3 4 r r r r r − 5 3 u u r 0 2 r r − 1 1 0 0 1 2 3 4 5 Introduction to Global Optimization – p. 9

  68. Clustering: MLSL Sampling proceed in batches of N points. Given sample points X 1 , . . . , X k ∈ [0 , 1] n , label X j as “clustered” iff ∃ Y ∈ X 1 , . . . , X k : �� 1 1 � log k 1 + n n � √ || X j − Y || ≤ ∆ k := σ Γ k 2 2 π and f ( Y ) ≤ f ( X j ) Introduction to Global Optimization – p. 9

  69. Simple Linkage A sequential sample is generated (batches consist of a single observation). A local search is started only from the last sampled point (i.e. there is no “recall”) unless there exists a sufficiently near sampled point with better function valure Introduction to Global Optimization – p. 9

  70. Smoothing methods Given f : R n → R , the Gaussian transform is defined as: 1 � −� y − x � 2 /λ 2 � � � f � λ ( x ) = R n f ( y ) exp π n/ 2 λ n When λ is sufficiently large ⇒� f � λ is convex. Idea: starting with a large enough λ , minimize the smoothed function and slowly decrease λ towards 0. Introduction to Global Optimization – p. 9

  71. Smoothing methods 3 2.5 2 1.5 1 0.5 0 10 5 -10 0 -5 0 -5 5 -10 10 Introduction to Global Optimization – p. 9

  72. 3 2.5 2 1.5 1 0.5 0 10 5 -10 0 -5 0 -5 5 -10 10 Introduction to Global Optimization – p. 9

  73. 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 10 5 -10 0 -5 0 -5 5 -10 10 Introduction to Global Optimization – p. 9

  74. 2.2 2 1.8 1.6 1.4 1.2 1 0.8 10 5 -10 0 -5 0 -5 5 -10 10 Introduction to Global Optimization – p. 9

  75. 2.2 2 1.8 1.6 1.4 1.2 1 0.8 10 5 -10 0 -5 0 -5 5 -10 10 Introduction to Global Optimization – p. 9

  76. Transformed function landscape Elementary idea: local optimization smooths out many “high frequency” oscillations Introduction to Global Optimization – p. 9

Recommend


More recommend