Nonlinear Optimization: Optimality conditions INSEAD, Spring 2006 - PowerPoint PPT Presentation

Nonlinear Optimization: Optimality conditions INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.1/62 Nonlinear optimization c

Outline General definitions Unconstrained problems Convex optimization Equality constraints Equality and inequality constraints � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.2/62 Nonlinear optimization c

General definitions � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.3/62 Nonlinear optimization c

Local and global optima (Strict) global minimum: x ∗ s.t. f ( x ∗ ) < ( ≤ ) f ( x ) , ∀ x ∈ X . (Strict) local minimum: x ∗ s.t. f ( x ∗ ) < ( ≤ ) f ( x ) , � N ( x ∗ ) , ∀ x ∈ X where N is a neighborhood of x ∗ (e.g., open ball). � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.4/62 Nonlinear optimization c

Derivatives A function f : R n → R is called (Frechet) differentiable at x ∈ R n if there exists a vector ∇ f ( x ) , called the gradient of f at x , such that: f ( x + u ) = f ( x ) + u ⊤ ∇ f ( x ) + o ( � u � ) . In that case we have: � ∂f � ⊤ ( x ) , . . . , ∂f ∇ f ( x ) = ( x ) . ∂x 1 ∂x n � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.5/62 Nonlinear optimization c

Second derivative If each component of ∇ f is itself differentiable, then f is called twice differentiable and the Hessian of f at x is the symmetric n × n matrix ∇ 2 f with entries: ∂ 2 f ∇ 2 f � � ij = ( x ) . ∂x i ∂x j In that case we have the following second-order expansion of f around x : f ( x + u ) = f ( x ) + u ⊤ ∇ f ( x ) + 1 2 u ⊤ ∇ 2 f ( x ) u + o � u � 2 � � . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.6/62 Nonlinear optimization c

Descent direction For any differentiable function f : R n → R and x ∈ R n , the set of descent directions is the set of vectors: � d ∈ R n : d ⊤ ∇ f ( x ) < 0 � D x = . If d is a descent direction of f at x , then there exists a scalar ǫ 0 such that f ( x + ǫd ) < f ( x ) , ∀ ǫ ∈ (0 , ǫ ) . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.7/62 Nonlinear optimization c

Feasible direction At a feasible point x , a feasible direction d ∈ R n is a direction such that x + ǫd is feasible for sufficiently small ǫ > 0 . The set of feasible directions is formally defined as: F x = { d ∈ R n : d � = 0 and ∃ ǫ 0 > 0 , ∀ ǫ ∈ (0 , ǫ 0 ) , x + ǫd ∈ X} . Examples X = R n = ⇒ F x = R n . X = { x : Ax + b = 0 } = ⇒ F x = { d : Ad = 0 } . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.8/62 Nonlinear optimization c

Optimality conditions minimize f ( x ) subject to x ∈ X a point x ∈ X is called feasible How do we recognize a solution to a nonlinear optimization problem? An optimality condition is a condition x must fulfill to be the solution (usually necessary but not sufficient ). � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.9/62 Nonlinear optimization c

Why optimality conditions? When solved, the conditions provide a set of minima candidates (although not easy in practice) Useful to design (e.g., stopping criterion) and analyse (e.g., convergence) optimization algorithms Useful for further analysis (e.g., sensitivity analysis in microeconomics) � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.10/62 Nonlinear optimization c

A general optimality condition A general necessary condition for a feasible point x to be a local minimum is that no little move from x in the feasible set decreases the objective function, i.e., that no feasible direction be a descent direction: D x ∩ F x = ∅ . We will now see how this principle translates in different contexts: unconstrained problems : D = ∅ , equality constraints : Lagrange theorem, equality/inequality constraints : KKT conditions. � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.11/62 Nonlinear optimization c

Unconstrained optimization � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.12/62 Nonlinear optimization c

First-order condition Consider the unconstrained optimization problem: minimize f ( x ) x ∈ R n . subject to Théorème 1 If x ∗ is a local minimum of f , and if f is differentiable in x ∗ , then: ∇ f ( x ∗ ) = 0 . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.13/62 Nonlinear optimization c

Proof For a direction d ∈ R n , we have: f ( x ∗ + ǫd ) − f ( x ∗ ) d ⊤ ∇ f ( x ∗ ) = lim ≥ 0 . ǫ ǫ → 0 Similarly, for the direction − d , we obtain − d ⊤ ∇ f ( x ) ≥ 0 , therefore: ∀ d ∈ R n , d ⊤ ∇ f ( x ∗ ) = 0 . This shows that ∇ f ( x ∗ ) = 0 . � � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.14/62 Nonlinear optimization c

Limits of first-order conditions First-order conditions only detect stationary points � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.15/62 Nonlinear optimization c

Positive (semi-)definite matrices Let A be a symmetric n × n matrix. The eigenvalues of A are real. A is called positive definite (denoted A ≻ 0 ) if all eigenvalues are positive , or equivalently: x ⊤ Ax > 0 , ∀ x ∈ R n , x � = 0 . A is called positive semidefinite (denoted A � 0 ) if all eigenvalues are non-negative , or equivalently: ∀ x ∈ R n . x ⊤ Ax ≥ 0 , � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.16/62 Nonlinear optimization c

Second order conditions Théorème 2 If x ∗ is a local minimum of f , and if f is twice differentiable in x ∗ , then: ∇ 2 f ( x ∗ ) � 0 . ∇ f ( x ∗ ) = 0 and Conversely, if x ∗ satisfies: ∇ 2 f ( x ∗ ) ≻ 0 , ∇ f ( x ∗ ) = 0 and then x ∗ is a strict local minimum of f . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.17/62 Nonlinear optimization c

Remark There may be points that satistfy the necessary first- and second-order conditions, but which are not local minima. There may be points that are local minima, but which do not satisfy the first- and second-order sufficient conditions. � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.18/62 Nonlinear optimization c

Proof Remember the Taylor expansion around x : f ( x + u ) = f ( x ) + u ⊤ ∇ f ( x ) + 1 2 u ⊤ ∇ 2 f ( x ) u + o � u � 2 � � . At a local minimum x ∗ the first-order condition ∇ f ( x ) = 0 holds, and therefore for any direction d ∈ R n : 0 ≤ f ( x ∗ + ǫd ) − f ( x ∗ ) ǫ 2 � � 2 d ⊤ ∇ 2 f ( x ∗ ) d + o = 1 . ǫ 2 ǫ 2 Taking the limit for ǫ → 0 gives d ⊤ ∇ 2 f ( x ∗ ) d for any d ∈ R n , and therefore ∇ 2 f ( x ∗ ) � 0 . � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.19/62 Nonlinear optimization c

Proof (cont.) Conversely suppose that x ∗ is such that ∇ f ( x ∗ ) = 0 and ∇ 2 f ( x ∗ ) ≻ 0 . Let λ > 0 be the smallest eigenvalue of ∇ 2 f ( x ∗ ) , then we have: d ⊤ ∇ 2 f ( x ∗ ) d ≥ λ � d � 2 , ∀ d ∈ R d . The Taylor expansion therefore gives for all d : f ( x ∗ + d ) − f ( x ∗ ) = 1 2 d ⊤ ∇ 2 f ( x ∗ ) d + o � d � 2 � � ≥ λ 2 � d � 2 + o � d � 2 � � � � � d � 2 � � 2 + o λ � d � 2 = � � d � 2 � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.20/62 Nonlinear optimization c

Summary ∇ f ( x ) = 0 defines a stationary point (including but not limited to local and global minima and maxima). If x ∗ is a stationary point and ∇ 2 f ( x ∗ ) ≻ 0 (resp. ≺ 0 ) and x ∗ is a local minimum (resp. maximum). If ∇ 2 f ( x ∗ ) has strictly positive and negative eigenvalues then x ∗ is neither a local minimum nor a local maximum. � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.21/62 Nonlinear optimization c

Example f ( x 1 , x 2 ) = 1 1 + 1 1 + 2 x 1 x 2 + 1 3 x 3 2 x 2 2 x 2 2 − x 2 + 1 . f is infinitely differentiable. Its gradient and Hessian are: � � x 2 1 + x 1 + 2 x 2 ∇ f ( x 1 , x 2 ) = , 2 x 1 + x 2 − 1 � � 2 x 1 + 1 2 ∇ 2 f ( x 1 , x 2 ) = . 2 1 � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.22/62 Nonlinear optimization c

Example (cont.) There are two stationary points: x a = (1 , − 1) ⊤ and x b = (2 , − 3) ⊤ . The corresponding Hessian are: � � � � 3 2 5 2 ∇ 2 f ( x a ) = ∇ 2 f ( x b ) = and 2 1 2 1 ∇ 2 f ( x a ) � � det = − 1 so the Hessian has a negative and a positive eigenvalue: x a is neither a local maximum nor a local minimum ∇ 2 f ( x b ) ≻ 0 so x b is a local minimum. � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.23/62 Nonlinear optimization c

Convex optimization � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.24/62 Nonlinear optimization c

Convex set x 1 , x 2 ∈ C, 0 ≤ θ ≤ 1 ⇒ θx 1 + (1 − θ ) x 2 ∈ C = � 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.25/62 Nonlinear optimization c

Nonlinear Optimization: Optimality conditions INSEAD, Spring 2006 - PowerPoint PPT Presentation

Nonlinear Optimization: Optimality conditions INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org 2003-2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.1/62 Nonlinear optimization c

Optimality Conditions Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Optimality

CS675: Convex and Combinatorial Optimization Fall 2014 Optimality Conditions for Convex

AM 205: lecture 18 Last time: optimization methods Today: conditions for optimality

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Nonlinear Optimization I Dr. Thomas M. Surowiec Humboldt University of Berlin Department of

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 13: Optimality Conditions for

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Optimality Conditions for Edge-concave Quadratic Programs William Hager 1 James Hungerford 2 1

Second-order optimality conditions in Pontryagin form for optimal control problems eric Bonnans

Optimality-based Domain Reductions for Global Optimization A. Caprara, M. Locatelli, M. Monaci

Pointwise second-order necessary optimality conditions and sensitivity relations Nonlinear

AM 205: lecture 19 Last time: Conditions for optimality, Newtons method for optimization

Econ 2148, fall 2019 Statistical decision theory Maximilian Kasy Department of Economics,

Shaping Signals Preparing for the Future through Speculative Design Phil Balagtas Design

Self-Organization in Autonomous Sensor/Actuator Networks [SelfOrg] Dr.-Ing. Falko Dressler

Amir Shokri Amirsh.nll@gmail.com Dr. Kourosh Kiani

Econ 2148, fall 2017 Statistical decision theory Maximilian Kasy Department of Economics,

Combinatorics of optimal designs R. A. Bailey and Peter J. Cameron p.j.cameron@qmul.ac.uk

Upper confidence bound algorithms Christos Dimitrakakis EPFL November 6, 2013 Christos

Topics in Combinatorial Optimization Orlando Lee Unicamp 3 de junho de 2014 Orlando Lee