Convex Optimization — Boyd & Vandenberghe 5. Duality • Lagrange dual problem • weak and strong duality • geometric interpretation • optimality conditions • perturbation and sensitivity analysis • examples • generalized inequalities 5–1
Lagrangian standard form problem (not necessarily convex) minimize f 0 ( x ) subject to f i ( x ) ≤ 0 , i = 1 , . . . , m h i ( x ) = 0 , i = 1 , . . . , p variable x ∈ R n , domain D , optimal value p ⋆ Lagrangian: L : R n × R m × R p → R , with dom L = D × R m × R p , p m � � L ( x, λ, ν ) = f 0 ( x ) + λ i f i ( x ) + ν i h i ( x ) i =1 i =1 • weighted sum of objective and constraint functions • λ i is Lagrange multiplier associated with f i ( x ) ≤ 0 • ν i is Lagrange multiplier associated with h i ( x ) = 0 Duality 5–2
Lagrange dual function Lagrange dual function: g : R m × R p → R , g ( λ, ν ) = x ∈D L ( x, λ, ν ) inf p � m � � � = inf f 0 ( x ) + λ i f i ( x ) + ν i h i ( x ) x ∈D i =1 i =1 g is concave, can be −∞ for some λ , ν lower bound property: if λ � 0 , then g ( λ, ν ) ≤ p ⋆ proof: if ˜ x is feasible and λ � 0 , then f 0 (˜ x ) ≥ L (˜ x, λ, ν ) ≥ inf x ∈D L ( x, λ, ν ) = g ( λ, ν ) x gives p ⋆ ≥ g ( λ, ν ) minimizing over all feasible ˜ Duality 5–3
Least-norm solution of linear equations x T x minimize subject to Ax = b dual function • Lagrangian is L ( x, ν ) = x T x + ν T ( Ax − b ) • to minimize L over x , set gradient equal to zero: ∇ x L ( x, ν ) = 2 x + A T ν = 0 x = − (1 / 2) A T ν = ⇒ • plug in in L to obtain g : g ( ν ) = L (( − 1 / 2) A T ν, ν ) = − 1 4 ν T AA T ν − b T ν a concave function of ν lower bound property : p ⋆ ≥ − (1 / 4) ν T AA T ν − b T ν for all ν Duality 5–4
Standard form LP c T x minimize subject to Ax = b, x � 0 dual function • Lagrangian is c T x + ν T ( Ax − b ) − λ T x L ( x, λ, ν ) = − b T ν + ( c + A T ν − λ ) T x = • L is affine in x , hence − b T ν A T ν − λ + c = 0 � g ( λ, ν ) = inf x L ( x, λ, ν ) = −∞ otherwise g is linear on affine domain { ( λ, ν ) | A T ν − λ + c = 0 } , hence concave lower bound property : p ⋆ ≥ − b T ν if A T ν + c � 0 Duality 5–5
Equality constrained norm minimization minimize � x � subject to Ax = b dual function b T ν � A T ν � ∗ ≤ 1 � x ( � x � − ν T Ax + b T ν ) = g ( ν ) = inf −∞ otherwise where � v � ∗ = sup � u �≤ 1 u T v is dual norm of � · � proof: follows from inf x ( � x � − y T x ) = 0 if � y � ∗ ≤ 1 , −∞ otherwise • if � y � ∗ ≤ 1 , then � x � − y T x ≥ 0 for all x , with equality if x = 0 • if � y � ∗ > 1 , choose x = tu where � u � ≤ 1 , u T y = � y � ∗ > 1 : � x � − y T x = t ( � u � − � y � ∗ ) → −∞ as t → ∞ lower bound property: p ⋆ ≥ b T ν if � A T ν � ∗ ≤ 1 Duality 5–6
Two-way partitioning x T Wx minimize x 2 subject to i = 1 , i = 1 , . . . , n • a nonconvex problem; feasible set contains 2 n discrete points • interpretation: partition { 1 , . . . , n } in two sets; W ij is cost of assigning i , j to the same set; − W ij is cost of assigning to different sets dual function � x ( x T Wx + ν i ( x 2 x x T ( W + diag ( ν )) x − 1 T ν g ( ν ) = inf i − 1)) = inf i � − 1 T ν W + diag ( ν ) � 0 = −∞ otherwise lower bound property : p ⋆ ≥ − 1 T ν if W + diag ( ν ) � 0 example: ν = − λ min ( W ) 1 gives bound p ⋆ ≥ nλ min ( W ) Duality 5–7
Lagrange dual and conjugate function minimize f 0 ( x ) subject to Ax � b, Cx = d dual function f 0 ( x ) + ( A T λ + C T ν ) T x − b T λ − d T ν � � g ( λ, ν ) = inf x ∈ dom f 0 − f ∗ 0 ( − A T λ − C T ν ) − b T λ − d T ν = • recall definition of conjugate f ∗ ( y ) = sup x ∈ dom f ( y T x − f ( x )) • simplifies derivation of dual if conjugate of f 0 is known example: entropy maximization n n � � f ∗ e y i − 1 f 0 ( x ) = x i log x i , 0 ( y ) = i =1 i =1 Duality 5–8
The dual problem Lagrange dual problem maximize g ( λ, ν ) subject to λ � 0 • finds best lower bound on p ⋆ , obtained from Lagrange dual function • a convex optimization problem; optimal value denoted d ⋆ • λ , ν are dual feasible if λ � 0 , ( λ, ν ) ∈ dom g • often simplified by making implicit constraint ( λ, ν ) ∈ dom g explicit example: standard form LP and its dual (page 5–5) c T x − b T ν minimize maximize A T ν + c � 0 subject to Ax = b subject to x � 0 Duality 5–9
Weak and strong duality weak duality: d ⋆ ≤ p ⋆ • always holds (for convex and nonconvex problems) • can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP − 1 T ν maximize subject to W + diag ( ν ) � 0 gives a lower bound for the two-way partitioning problem on page 5–7 strong duality: d ⋆ = p ⋆ • does not hold in general • (usually) holds for convex problems • conditions that guarantee strong duality in convex problems are called constraint qualifications Duality 5–10
Slater’s constraint qualification strong duality holds for a convex problem minimize f 0 ( x ) subject to f i ( x ) ≤ 0 , i = 1 , . . . , m Ax = b if it is strictly feasible, i.e. , ∃ x ∈ int D : f i ( x ) < 0 , i = 1 , . . . , m, Ax = b • also guarantees that the dual optimum is attained (if p ⋆ > −∞ ) • can be sharpened: e.g. , can replace int D with relint D (interior relative to affine hull); linear inequalities do not need to hold with strict inequality, . . . • there exist many other types of constraint qualifications Duality 5–11
Inequality form LP primal problem c T x minimize subject to Ax � b dual function − b T λ A T λ + c = 0 � ( c + A T λ ) T x − b T λ � � g ( λ ) = inf = −∞ otherwise x dual problem − b T λ maximize A T λ + c = 0 , subject to λ � 0 • from Slater’s condition: p ⋆ = d ⋆ if A ˜ x ≺ b for some ˜ x • in fact, p ⋆ = d ⋆ except when primal and dual are infeasible Duality 5–12
Quadratic program primal problem (assume P ∈ S n ++ ) x T Px minimize subject to Ax � b dual function = − 1 x T Px + λ T ( Ax − b ) 4 λ T AP − 1 A T λ − b T λ � � g ( λ ) = inf x dual problem − (1 / 4) λ T AP − 1 A T λ − b T λ maximize subject to λ � 0 • from Slater’s condition: p ⋆ = d ⋆ if A ˜ x ≺ b for some ˜ x • in fact, p ⋆ = d ⋆ always Duality 5–13
A nonconvex problem with strong duality x T Ax + 2 b T x minimize x T x ≤ 1 subject to A �� 0 , hence nonconvex dual function: g ( λ ) = inf x ( x T ( A + λI ) x + 2 b T x − λ ) • unbounded below if A + λI �� 0 or if A + λI � 0 and b �∈ R ( A + λI ) • minimized by x = − ( A + λI ) † b otherwise: g ( λ ) = − b T ( A + λI ) † b − λ dual problem and equivalent SDP: − b T ( A + λI ) † b − λ maximize maximize − t − λ � A + λI � subject to A + λI � 0 b subject to � 0 b T b ∈ R ( A + λI ) t strong duality although primal problem is not convex (not easy to show) Duality 5–14
Geometric interpretation for simplicity, consider problem with one constraint f 1 ( x ) ≤ 0 interpretation of dual function: g ( λ ) = ( u,t ) ∈G ( t + λu ) , inf where G = { ( f 1 ( x ) , f 0 ( x )) | x ∈ D} t t G G p ⋆ p ⋆ d ⋆ λu + t = g ( λ ) g ( λ ) u u • λu + t = g ( λ ) is (non-vertical) supporting hyperplane to G • hyperplane intersects t -axis at t = g ( λ ) Duality 5–15
epigraph variation: same interpretation if G is replaced with A = { ( u, t ) | f 1 ( x ) ≤ u, f 0 ( x ) ≤ t for some x ∈ D} t A p ⋆ λu + t = g ( λ ) g ( λ ) u strong duality • holds if there is a non-vertical supporting hyperplane to A at (0 , p ⋆ ) • for convex problem, A is convex, hence has supp. hyperplane at (0 , p ⋆ ) u, ˜ • Slater’s condition: if there exist (˜ t ) ∈ A with ˜ u < 0 , then supporting hyperplanes at (0 , p ⋆ ) must be non-vertical Duality 5–16
Complementary slackness assume strong duality holds, x ⋆ is primal optimal, ( λ ⋆ , ν ⋆ ) is dual optimal p m � � � � f 0 ( x ⋆ ) = g ( λ ⋆ , ν ⋆ ) λ ⋆ ν ⋆ = inf f 0 ( x ) + i f i ( x ) + i h i ( x ) x i =1 i =1 m p � � f 0 ( x ⋆ ) + λ ⋆ i f i ( x ⋆ ) + ν ⋆ i h i ( x ⋆ ) ≤ i =1 i =1 f 0 ( x ⋆ ) ≤ hence, the two inequalities hold with equality • x ⋆ minimizes L ( x, λ ⋆ , ν ⋆ ) • λ ⋆ i f i ( x ⋆ ) = 0 for i = 1 , . . . , m (known as complementary slackness): λ ⋆ ⇒ f i ( x ⋆ ) = 0 , f i ( x ⋆ ) < 0 = ⇒ λ ⋆ i > 0 = i = 0 Duality 5–17
Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable f i , h i ): 1. primal constraints: f i ( x ) ≤ 0 , i = 1 , . . . , m , h i ( x ) = 0 , i = 1 , . . . , p 2. dual constraints: λ � 0 3. complementary slackness: λ i f i ( x ) = 0 , i = 1 , . . . , m 4. gradient of Lagrangian with respect to x vanishes: p m � � ∇ f 0 ( x ) + λ i ∇ f i ( x ) + ν i ∇ h i ( x ) = 0 i =1 i =1 from page 5–17: if strong duality holds and x , λ , ν are optimal, then they must satisfy the KKT conditions Duality 5–18
Recommend
More recommend