nonlinear optimization optimality conditions
play

Nonlinear Optimization: Optimality conditions INSEAD, Spring 2006 - PowerPoint PPT Presentation

Nonlinear Optimization: Optimality conditions INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.1/62 Nonlinear optimization c


  1. Nonlinear Optimization: Optimality conditions INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.1/62 Nonlinear optimization c

  2. Outline General definitions Unconstrained problems Convex optimization Equality constraints Equality and inequality constraints � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.2/62 Nonlinear optimization c

  3. General definitions � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.3/62 Nonlinear optimization c

  4. Local and global optima (Strict) global minimum: x ∗ s.t. f ( x ∗ ) < ( ≤ ) f ( x ) , ∀ x ∈ X . (Strict) local minimum: x ∗ s.t. f ( x ∗ ) < ( ≤ ) f ( x ) , � N ( x ∗ ) , ∀ x ∈ X where N is a neighborhood of x ∗ (e.g., open ball). � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.4/62 Nonlinear optimization c

  5. Derivatives A function f : R n → R is called (Frechet) differentiable at x ∈ R n if there exists a vector ∇ f ( x ) , called the gradient of f at x , such that: f ( x + u ) = f ( x ) + u ⊤ ∇ f ( x ) + o ( � u � ) . In that case we have: � ∂f � ⊤ ( x ) , . . . , ∂f ∇ f ( x ) = ( x ) . ∂x 1 ∂x n � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.5/62 Nonlinear optimization c

  6. Second derivative If each component of ∇ f is itself differentiable, then f is called twice differentiable and the Hessian of f at x is the symmetric n × n matrix ∇ 2 f with entries: ∂ 2 f ∇ 2 f � � ij = ( x ) . ∂x i ∂x j In that case we have the following second-order expansion of f around x : f ( x + u ) = f ( x ) + u ⊤ ∇ f ( x ) + 1 2 u ⊤ ∇ 2 f ( x ) u + o � u � 2 � � . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.6/62 Nonlinear optimization c

  7. Descent direction For any differentiable function f : R n → R and x ∈ R n , the set of descent directions is the set of vectors: � d ∈ R n : d ⊤ ∇ f ( x ) < 0 � D x = . If d is a descent direction of f at x , then there exists a scalar ǫ 0 such that f ( x + ǫd ) < f ( x ) , ∀ ǫ ∈ (0 , ǫ ) . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.7/62 Nonlinear optimization c

  8. Feasible direction At a feasible point x , a feasible direction d ∈ R n is a direction such that x + ǫd is feasible for sufficiently small ǫ > 0 . The set of feasible directions is formally defined as: F x = { d ∈ R n : d � = 0 and ∃ ǫ 0 > 0 , ∀ ǫ ∈ (0 , ǫ 0 ) , x + ǫd ∈ X} . Examples X = R n = ⇒ F x = R n . X = { x : Ax + b = 0 } = ⇒ F x = { d : Ad = 0 } . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.8/62 Nonlinear optimization c

  9. Optimality conditions minimize f ( x ) subject to x ∈ X a point x ∈ X is called feasible How do we recognize a solution to a nonlinear optimization problem? An optimality condition is a condition x must fulfill to be the solution (usually necessary but not sufficient ). � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.9/62 Nonlinear optimization c

  10. Why optimality conditions? When solved, the conditions provide a set of minima candidates (although not easy in practice) Useful to design (e.g., stopping criterion) and analyse (e.g., convergence) optimization algorithms Useful for further analysis (e.g., sensitivity analysis in microeconomics) � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.10/62 Nonlinear optimization c

  11. A general optimality condition A general necessary condition for a feasible point x to be a local minimum is that no little move from x in the feasible set decreases the objective function, i.e., that no feasible direction be a descent direction: D x ∩ F x = ∅ . We will now see how this principle translates in different contexts: unconstrained problems : D = ∅ , equality constraints : Lagrange theorem, equality/inequality constraints : KKT conditions. � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.11/62 Nonlinear optimization c

  12. Unconstrained optimization � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.12/62 Nonlinear optimization c

  13. First-order condition Consider the unconstrained optimization problem: minimize f ( x ) x ∈ R n . subject to Théorème 1 If x ∗ is a local minimum of f , and if f is differentiable in x ∗ , then: ∇ f ( x ∗ ) = 0 . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.13/62 Nonlinear optimization c

  14. Proof For a direction d ∈ R n , we have: f ( x ∗ + ǫd ) − f ( x ∗ ) d ⊤ ∇ f ( x ∗ ) = lim ≥ 0 . ǫ ǫ → 0 Similarly, for the direction − d , we obtain − d ⊤ ∇ f ( x ) ≥ 0 , therefore: ∀ d ∈ R n , d ⊤ ∇ f ( x ∗ ) = 0 . This shows that ∇ f ( x ∗ ) = 0 . � � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.14/62 Nonlinear optimization c

  15. Limits of first-order conditions First-order conditions only detect stationary points � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.15/62 Nonlinear optimization c

  16. Positive (semi-)definite matrices Let A be a symmetric n × n matrix. The eigenvalues of A are real. A is called positive definite (denoted A ≻ 0 ) if all eigenvalues are positive , or equivalently: x ⊤ Ax > 0 , ∀ x ∈ R n , x � = 0 . A is called positive semidefinite (denoted A � 0 ) if all eigenvalues are non-negative , or equivalently: ∀ x ∈ R n . x ⊤ Ax ≥ 0 , � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.16/62 Nonlinear optimization c

  17. Second order conditions Théorème 2 If x ∗ is a local minimum of f , and if f is twice differentiable in x ∗ , then: ∇ 2 f ( x ∗ ) � 0 . ∇ f ( x ∗ ) = 0 and Conversely, if x ∗ satisfies: ∇ 2 f ( x ∗ ) ≻ 0 , ∇ f ( x ∗ ) = 0 and then x ∗ is a strict local minimum of f . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.17/62 Nonlinear optimization c

  18. Remark There may be points that satistfy the necessary first- and second-order conditions, but which are not local minima. There may be points that are local minima, but which do not satisfy the first- and second-order sufficient conditions. � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.18/62 Nonlinear optimization c

  19. Proof Remember the Taylor expansion around x : f ( x + u ) = f ( x ) + u ⊤ ∇ f ( x ) + 1 2 u ⊤ ∇ 2 f ( x ) u + o � u � 2 � � . At a local minimum x ∗ the first-order condition ∇ f ( x ) = 0 holds, and therefore for any direction d ∈ R n : 0 ≤ f ( x ∗ + ǫd ) − f ( x ∗ ) ǫ 2 � � 2 d ⊤ ∇ 2 f ( x ∗ ) d + o = 1 . ǫ 2 ǫ 2 Taking the limit for ǫ → 0 gives d ⊤ ∇ 2 f ( x ∗ ) d for any d ∈ R n , and therefore ∇ 2 f ( x ∗ ) � 0 . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.19/62 Nonlinear optimization c

  20. Proof (cont.) Conversely suppose that x ∗ is such that ∇ f ( x ∗ ) = 0 and ∇ 2 f ( x ∗ ) ≻ 0 . Let λ > 0 be the smallest eigenvalue of ∇ 2 f ( x ∗ ) , then we have: d ⊤ ∇ 2 f ( x ∗ ) d ≥ λ � d � 2 , ∀ d ∈ R d . The Taylor expansion therefore gives for all d : f ( x ∗ + d ) − f ( x ∗ ) = 1 2 d ⊤ ∇ 2 f ( x ∗ ) d + o � d � 2 � � ≥ λ 2 � d � 2 + o � d � 2 � � � � � d � 2 � � 2 + o λ � d � 2 = � � d � 2 � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.20/62 Nonlinear optimization c

  21. Summary ∇ f ( x ) = 0 defines a stationary point (including but not limited to local and global minima and maxima). If x ∗ is a stationary point and ∇ 2 f ( x ∗ ) ≻ 0 (resp. ≺ 0 ) and x ∗ is a local minimum (resp. maximum). If ∇ 2 f ( x ∗ ) has strictly positive and negative eigenvalues then x ∗ is neither a local minimum nor a local maximum. � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.21/62 Nonlinear optimization c

  22. Example f ( x 1 , x 2 ) = 1 1 + 1 1 + 2 x 1 x 2 + 1 3 x 3 2 x 2 2 x 2 2 − x 2 + 1 . f is infinitely differentiable. Its gradient and Hessian are: � � x 2 1 + x 1 + 2 x 2 ∇ f ( x 1 , x 2 ) = , 2 x 1 + x 2 − 1 � � 2 x 1 + 1 2 ∇ 2 f ( x 1 , x 2 ) = . 2 1 � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.22/62 Nonlinear optimization c

  23. Example (cont.) There are two stationary points: x a = (1 , − 1) ⊤ and x b = (2 , − 3) ⊤ . The corresponding Hessian are: � � � � 3 2 5 2 ∇ 2 f ( x a ) = ∇ 2 f ( x b ) = and 2 1 2 1 ∇ 2 f ( x a ) � � det = − 1 so the Hessian has a negative and a positive eigenvalue: x a is neither a local maximum nor a local minimum ∇ 2 f ( x b ) ≻ 0 so x b is a local minimum. � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.23/62 Nonlinear optimization c

  24. Convex optimization � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.24/62 Nonlinear optimization c

  25. Convex set x 1 , x 2 ∈ C, 0 ≤ θ ≤ 1 ⇒ θx 1 + (1 − θ ) x 2 ∈ C = � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.25/62 Nonlinear optimization c

Recommend


More recommend