lagrange function and kkt conditions
play

Lagrange Function and KKT Conditions October 26, 2018 265 / 429 - PowerPoint PPT Presentation

Lagrange Function and KKT Conditions October 26, 2018 265 / 429 How do you compute the table of Orthogonal Projections? 1 1 || x z || || x z || 2 2 + I ( x ) =argmin P C ( z ) = prox I ( z ) =argmin C x C 2 t 2 t C x Set C =


  1. Lagrange Function and KKT Conditions October 26, 2018 265 / 429

  2. How do you compute the table of Orthogonal Projections? 1 1 || x − z || || x − z || 2 2 + I ( x ) =argmin P C ( z ) = prox I ( z ) =argmin C x ∈ C 2 t 2 t C x Set C = For t = 1, P C ( z ) = Assumptions ℜ n [ z ] + + P ( z ) = min { max { z , l } , u } l i ≤ u i Box [ l , u ] C i i i i r ( z − c ) ∥ . ∥ 2 ball, centre c ∈ ℜ n & radius r > 0 Ball [ c , r ] c + max T {∥ z − c ∥ 2 , r } { x | A x = b } z − A ( AA T ) − 1 ( A z − b ) A ∈ ℜ m × n , b ∈ ℜ m , A is full row rank [ a x − b ] + T { x | a T x ≤ b } z − 0 ̸ = a ∈ ℜ n b ∈ ℜ ∥ a ∥ 2 [ z − µ ∗ e ] + where µ ∗ ∈ ℜ satisfies e T [ z − µ ∗ e ] + = 1 ∆ n P Box [ l , u ] ( z − µ ∗ a ) where µ ∗ ∈ ℜ satisfies ∩ Box [ l , u ] 0 ̸ = a ∈ ℜ n b ∈ℜ H a , b a T P Box [ l , u ] ( z − µ ∗ a ) = b a T P Box [ l , u ] ( z ) ≤ b P Box [ l , u ] ( z ) H − a , b ∩ Box [ l , u ] P Box [ l , u ] ( z − λ ∗ a ) a ∈ ℜ ̸ n b ∈ ℜ a T P Box [ l , u ] ( z ) > b 0 = where λ ∗ a T P Box [ l , u ] ( z − λ ∗ a ) = b & λ ∗ > 0 ∈ℜ satisfies ∥ z ∥ 1 ≤ α z [ z − λ ∗ e ] + ⊙ sign ( z ) ∥ z ∥ 1 > α B ∥ . ∥ 1 [0 , α ] α > 0 where λ ∗ > 0 , & [ z − λ ∗ e ] + ⊙ sign ( z ) = α October 26, 2018 266 / 429

  3. Lagrange Function and Necessary KKT Conditions Can the Lagrange Multiplier construction be generalized to always find optimal solutions to a minimization problem? Instead of the iterative path again, assume everything can be computed analytically Attributed to the mathematician Lagrange (born in1736in Turin). Largely worked on mechanics, the calculus of variations, probability, group theory, and number theory. Credited with the choice of base 10 for the metric system (rather than12). October 26, 2018 267 / 429

  4. Lagrange Function and Necessary KKT Conditions Note that a lot of the analysis that follows does not even assume convexity Necessary conditions often do NOT require c o n v e x i t y C o n s i d e r the equality constrainedminimization problem (with D⊆ ℜ n ) min f ( x ) x ∈ D (67) grad f has a x' subject to g i ( x ) = 0 i = 1 , 2 ,..., m non-zero component perpendicular to gradient of g1 The figure shows some level curves of the function f and of a single constraint function g 1 (dotted lines) All this shows The gradient of the constraint ∇ g 1 is not parallel to that there cannot be a localminimum the gradient ∇ f of the function at f = 10 . 4; it is at x' therefore possible to reduce the value of f by moving in negative of non-zero compon Moving perpendicular to grad g1 = => g1(x) = 0 remains perpendicular to grad g1 Goal: We should not be able to reduce the value of f while still honoring g1(x) = 0 October 26, 2018 268 / 429

  5. Lagrange Function and Necessary KKT Conditions Consider the equality constrained minimization problem (with D⊆ ℜ n ) min f ( x ) x ∈ D (67) subject to g i ( x ) = 0 i = 1 , 2 ,..., m The figure shows some level curves of the function f and of a single constraint function g 1 (dotted lines) The gradient of the constraint ∇ g 1 is not parallel to the gradient ∇ f of the function at f = 10 . 4; it is therefore possible to move along the constraint surface so as to further reduce f . October 26, 2018 268 / 429

  6. Lagrange Function and Necessary KKT Conditions However, ∇ g 1 and ∇ f are pa t f = 10 . 3, and rallel a any motion along g 1 ( x ) = 0will lie along the perpendicular to gradient of g1(x) gradient of f along that at that point <==> but direction = 0!! ==> If we try to decrease value of f, we will land up increasing/decreasing g1 (unacceptable) ==> If we move along perpendicular to gradient of g1, no change expected in f SO gradients of f and g being in same/opposite directions is necessary condition for local minimum/maximum October 26, 2018 269 / 429

  7. Lagrange Function and Necessary KKT Conditions However, ∇ g 1 and ∇ f are parallel at f = 10 . 3, and any motion along g 1 ( x ) = 0will leave f unchanged . Hence, at the solution x ∗ , gradient f(x*) proportional to gradient g1(x*) October 26, 2018 269 / 429

  8. Lagrange Function and Necessary KKT Conditions However, ∇ g 1 and ∇ f are parallel at f = 10 . 3, and any motion along g 1 ( x ) = 0will leave f unchanged . Hence, at the solution x ∗ , ∇ f ( x ∗ )must be proportional to −∇ g 1 ( x ∗ ), yielding, ∇ f ( x ∗ ) = − λ ∇ g 1 ( x ∗ ), for some constant λ ∈ ℜ ; λ is called a Lagrange multiplier . Often λ itself need never be computed and therefore often qualified as the undetermined lagrange multiplier. October 26, 2018 269 / 429

  9. Lagrange Function and Necessary KKT Conditions The necessary condition for an optimum at x ∗ for the optimization problem in (68) with m = 1can be stated as in (68); the gradient is now in The gradient of the Lagrange function wrt x* and lambda* should vanish as a necessary condition for optimum at x*,lambda* October 26, 2018 270 / 429

  10. Lagrange Function and Necessary KKT Conditions The necessary condition for an optimum at x ∗ for the optimization problem in (68) with m = 1can be stated as in (68); the gradient is now in ℜ n +1 with its last component being a partial derivative with respect to λ . ∇ L ( x ∗ , λ ∗ ) = ∇ f ( x ∗ ) + λ ∗ ∇ g 1 ( x ∗ ) = 0 (68) g i ( x ∗ )= 0 The solutions to (68) are the stationary points of the lagrangian L ; they are not necessarily local extrema of L . ▶ L is unbounded: given a point x that doesn’t lie on the constraint, letting λ →±∞ makes L arbitrarily large or small. (General property of linear functions - here linearity in lambda) ▶ However, under certain stronger assumptions, if the strong Lagrangian principle holds, the minima of f minimize the Lagrangian globally. A bit later October 26, 2018 270 / 429

  11. Lagrange Function and Necessary KKT Conditions Let us extend the necessary condition for optimality of a minimization problem with single constraint to minimization problems with multiple equality constraints ( i.e. , m > 1. in (67)). Let S be the subspace spanned by ∇ g i ( x )at any point x and let S ⊥ be its orthogonal complement. Let( ∇ f ) ⊥ be the component of ∇ f inthe subspace S ⊥ . Moving perpendicular to S ==> all constraints remain satisi fi ed. ===> At an optimal point x*, we should not be able to move perpendicular to S while reducing the value off ===> Gradient of cannot have any component along perpendicular to S ===> f MUST lie in S October 26, 2018 271 / 429

  12. Lagrange Function and Necessary KKT Conditions Let us extend the necessary condition for optimality of a minimization problem with single constraint to minimization problems with multiple equality constraints ( i.e. , m > 1. in (67)). Let S be the subspace spanned by ∇ g i ( x )at any point x and let S ⊥ be its orthogonal complement. Let( ∇ f ) ⊥ be the component of ∇ f inthe subspace S ⊥ . At any solution x ∗ , it must be true that the gradient of f has( ∇ f ) ⊥ = 0( i.e. , no components that are perpendicular to all of the ∇ g i ), because otherwise you could move x ∗ a little in that direction (or in the opposite direction) to increase (decrease) f without changing any of the g i , i.e . without violating any constraints. Hence for multiple equality constraints, it must be true that at the solution x ∗ , the space S contains the vector ∇ f , i.e. , there are some constants λ i such that ∇ f ( x ∗ ) = λ i ∇ g i ( x ∗ ). October 26, 2018 271 / 429

  13. Lagrange Multipliers with Inequality Constraints We also need to impose that the solution is on the correct constraint surface ( i.e. , ∀ i ). In the same manner as in the case of m = 1, this can be encapsulated by g = 0 , i m ∑ introducing the Lagrangian L ( x , λ ) = f ( x ) + λ i g i ( x ), whose gradient with respect to i =1 both x , and λ vanishes at the solution. This gives us the following necessary condition for optimality of (67): October 26, 2018 272 / 429

  14. Lagrange Multipliers with Inequality Constraints Single equality constraint g 1 ( x ) = 0, replaced with a single inequality constraint g 1 ( x ) ≤ 0. The entire region labeled g 1 ( x ) ≤ 0in the Figure becomes feasible. At the solution x ∗ , if g 1 ( x ∗ ) = 0, i.e. , if the constraint is active, we must have gradient of f(x*) and gradient of g(x*) are in same space.. (active case is exactly the same as that of equality constrained optimization) INACTIVE CONSTRAINT ==> g1(x*) < 0 October 26, 2018 273 / 429

  15. Lagrange Multipliers with Inequality Constraints Single equality constraint g 1 ( x ) = 0, replaced with a single inequality constraint g 1 ( x ) ≤ 0. The entire region labeled g 1 ( x ) ≤ 0in the Figure becomes feasible. At the solution x ∗ , if g 1 ( x ∗ ) = 0, i.e. , if the constraint is active, we must have (as in the case of a single equality constraint) that ∇ f is parallel to ∇ g 1 , by the same argument as before. Additionally, necessary for the two gradients to point in opposite directions We have a problem: It is fi ne to reduce f while reducing g1 ==> It is fi ne to move in negative gradient f(x*) if that also has a component in negative gradient g1(x*) October 26, 2018 273 / 429

Recommend


More recommend