nonlinear optimization algorithms 3 interior point methods
play

Nonlinear Optimization: Algorithms 3: Interior-point methods - PowerPoint PPT Presentation

Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) p.1/32 Nonlinear


  1. Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.1/32 Nonlinear optimization c

  2. Outline Inequality constrained minimization Logarithmic barrier function and central path Barrier method Feasibility and phase I methods � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.2/32 Nonlinear optimization c

  3. Inequality constrained minimization � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.3/32 Nonlinear optimization c

  4. Setting We consider the problem: minimize f ( x ) subject to g i ( x ) ≤ 0 , i = 1 , . . . , m , Ax = b , f and g are supposed to be convex and twice continuously differentiable . A is a p × n matrix of rank p < n (i.e., fewer equality constraints than variables, and independent equality constraints). We assume f ∗ is finite and attained at x ∗ � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.4/32 Nonlinear optimization c

  5. Strong duality hypothesis We finally assume the problem is strictly feasible , i.e., there exists x with g i ( x ) < 0 , i = 1 , . . . , m , and Ax = 0 . This means that Slater’s constraint qualification holds = ⇒ strong duality holds and dual optimum is attained, i.e., there exists λ ∗ ∈ R p and µ ∈ R m which together with x ∗ satisfy the KKT conditions: Ax ∗ = b g i ( x ∗ ) ≤ 0 , i = 1 , . . . , m µ ∗ ≥ 0 m i ∇ g i ( x ∗ ) + A ⊤ λ ∗ = 0 � ∇ f ( x ∗ ) + µ ∗ i =1 µ ∗ i g i ( x ∗ ) = 0 , i = 1 , . . . , m . � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.5/32 Nonlinear optimization c

  6. Examples Many problems satisfy these conditions, e.g.: LP , QP , QCQP Entropy maximization with linear inequality constraints n � minimize x i log x i i =1 subject to Fx ≤ g Ax = b . � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.6/32 Nonlinear optimization c

  7. Examples (cont.) To obtain differentiability of the objective and constraints we might reformulate the problem, e.g: � � a ⊤ minimize max i x + b i i =1 ,...,n with nondifferentiable objective is equivalent to the LP: minimize t subject to a i ⊤ x + b ≤ t , i = 1 , . . . , m . Ax = b . � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.7/32 Nonlinear optimization c

  8. Overview Interior-point methods solve the problem (or the KKT conditions) by applying Newton’s method to a sequence of equality-constrained problems. They form another level in the hierarchy of convex optimization algorithms : Linear equality constrained quadratic problems ( LCQP ) are the simplest (set of linear equations that can be solved analytically) Newton’s method: reduces linear equality constrained convex optimization problems ( LCCP ) with twice differentiable objective to a sequence of LCQP . Interior-point methods reduce a problem with linear equality and inequality constraints to a sequence of LCCP . � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.8/32 Nonlinear optimization c

  9. Logarithmic barrier function and central path � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.9/32 Nonlinear optimization c

  10. Problem reformulation Our goal is to approximately formulate the inequality constrained problem as an equality constrained problem to which Newton’s method can be applied. To this end we first hide the inequality constraint implicit in the objective: m � minimize f ( x ) + I − ( g i ( x )) i =1 subject to Ax = b , where I − : R → R is the indicator function for nonpositive reals : � if u ≤ 0 , 0 I − ( u ) = + ∞ if u > 0 . � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.10/32 Nonlinear optimization c

  11. Logarithmic barrier The basic idea of the barrier method is to approximate the indicator function I − by the convex and differentiable function I − ( u ) = − 1 ˆ t log( − u ) , u < 0 , where t > 0 is a parameter that sets the accuracy of the prediction. � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.11/32 Nonlinear optimization c

  12. Problem reformulation Subsituting ˆ I − for I − in the optimization problem gives the approximation: m − 1 minimize � f ( x ) + t log ( − g i ( x )) i =1 subject to Ax = b , The objective function of this problem is convex and twice differentiable, so Newton’s method can be used to solve it. Of course this problem is just an approximation to the origi- nal problem. We will see that the quality of the approximation of the solution increases when t increases . � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.12/32 Nonlinear optimization c

  13. Logarithmic barrier function The function m � φ ( x ) = − log ( − g i ( x )) i =1 is called the logarithmic barrier or log barrier for the original optimization problem. Its domain is the set of points that satisfy all inequality constraints strictly , and it grows without bound if g i ( x ) → 0 for any i . Its gradient and Hessian are given by: m 1 � ∇ φ ( x ) = − g i ( x ) ∇ g i ( x ) , i =1 m m 1 1 g i ( x ) 2 ∇ g i ( x ) ∇ g i ( x ) ⊤ + ∇ 2 φ ( x ) = � � − g i ( x ) ∇ 2 g i ( x ) . i =1 i =1 � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.13/32 Nonlinear optimization c

  14. Central path Our approximate problem is therefore (equivalent to) the following problem: minimize tf ( x ) + φ ( x ) subject to Ax = b . We assume for now that this problem can be solved via Newton’s method, in particular that it has a unique solution x ∗ ( t ) for each t > 0 . The central path is the set of solutions, i.e.: { x ∗ ( t ) | t > 0 } . � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.14/32 Nonlinear optimization c

  15. Characterization of the central path A point x ∗ ( t ) is on the central path if and only if : it is strictly feasible , i.e., satisfies: Ax ∗ ( t ) = b , g i ( x ∗ ( t )) < 0 , i = 1 , . . . , m . λ ∈ R p such that: there exists a ˆ 0 = t ∇ f ( x ∗ ( t )) + ∇ φ ( x ∗ ( t )) + A ⊤ ˆ λ m 1 � − g i ( x ∗ ( t )) ∇ g i ( x ∗ ( t )) + A ⊤ ˆ = t ∇ f ( x ∗ ( t )) + λ . i =1 � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.15/32 Nonlinear optimization c

  16. Example: LP central path The log barrier for a LP: c ⊤ x minimize subject to Ax ≤ b , is given by m � � � b i − a ⊤ φ ( x ) = − log i x , i =1 where a i is the i th row of A. Its derivatives are: m m 1 1 � ∇ 2 φ ( x ) = � � 2 a i a ⊤ ∇ φ ( x ) = i xa i , i . b i − a ⊤ b i − a ⊤ � i x i =1 i =1 � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.16/32 Nonlinear optimization c

  17. Example (cont.) The derivatives can be rewritten more compactly: ∇ 2 φ ( x ) = A ⊤ diag ( d ) 2 A , ∇ φ ( x ) = A ⊤ d , where d ∈ R m is defined by d i = 1 / b i − a ⊤ � � . The centrality i x condition for x ∗ ( t ) is: tc + A ⊤ d = 0 = ⇒ at each point on the central path, ∇ φ ( x ) is parallel to − c . � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.17/32 Nonlinear optimization c

  18. Dual points on central path Remember that x = x ∗ ( t ) if there exists a w such that m 1 − g i ( x ∗ ( t )) ∇ g i ( x ∗ ( t )) + A ⊤ ˆ � t ∇ f ( x ∗ ( t )) + λ = 0 , Ax = b . i =1 Let us now define: ˆ 1 λ µ ∗ λ ∗ ( t ) = i ( t ) = − tg i ( x ∗ ( t )) , i = 1 , . . . , m, t . We claim that the pair λ ∗ ( t ) , µ ∗ ( t ) is dual feasible. � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.18/32 Nonlinear optimization c

  19. Dual points on central path (cont.) Indeed: µ ∗ ( t ) > 0 because g i ( x ∗ ( t )) < 0 x ∗ ( t ) minimizes the Lagrangian m i ( t ) g i ( x )+ λ ∗ ( t ) ⊤ ( Ax − b ) . � L ( x, λ ∗ ( t ) , µ ∗ ( t )) = f ( x )+ µ ∗ i =1 Therefore the dual function q ( µ ∗ ( t ) , λ ∗ ( t )) is finite and: q ( µ ∗ ( t ) , λ ∗ ( t )) = L ( x ∗ ( t ) , λ ∗ ( t ) , µ ∗ ( t )) = f ( x ∗ ( t )) − m t � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.19/32 Nonlinear optimization c

  20. Convergence of the central path From the equation: q ( µ ∗ ( t ) , λ ∗ ( t )) = f ( x ∗ ( t )) − m t we deduce that the duality gap associated with x ∗ ( t ) and the dual feasible pair λ ∗ ( t ) , µ ∗ ( t ) is simply m/t . As an important consequence we have: f ( x ∗ ( t )) − f ∗ ≤ m t This confirms the intuition that f ( x ∗ ( t )) → f ∗ if t → ∞ . � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.20/32 Nonlinear optimization c

  21. Interpretation via KKT conditions We can rewrite the conditions for x to be on the central path by the existence of λ, µ such that: 1. Primal constraints: g i ( x ) ≤ 0 , Ax = b 2. Dual constraints : µ ≥ 0 3. approximate complementary slackness: − µ i g i ( x ) = 1 /t 4. gradient of Lagrangian w.r.t. x vanishes: m � µ i ∇ g i ( x ) + A ⊤ λ = 0 ∇ f ( x ) + i =1 The only difference with KKT is that 0 is replaced by 1 /t in 3. For “large” t , the point on the central path “ almost ” satisfies the KKT conditions. � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.21/32 Nonlinear optimization c

  22. The barrier method � 2006 Jean-Philippe Vert, (Jean-Philippe.Vert@mines.org) – p.22/32 Nonlinear optimization c

Recommend


More recommend