primal dual subgradient method for convex problems with
play

Primal-dual Subgradient Method for Convex Problems with Functional - PowerPoint PPT Presentation

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual method for functional constraints


  1. Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual method for functional constraints 1/20

  2. Outline 1 Constrained optimization problem 2 Lagrange multipliers 3 Dual function and dual problem 4 Augmented Lagrangian 5 Switching subgradient method 6 Finding the dual multipliers 7 Complexity analysis Yu. Nesterov Primal-dual method for functional constraints 2/20

  3. Optimization problem: simple constraints Consider the problem: min x ∈ Q f ( x ) , where Q is a closed convex set: x , y ∈ Q ⇒ [ x , y ] ⊆ Q , f is a subdifferentiable on Q convex function: f ( y ) ≥ f ( x ) + �∇ f ( x ) , y − x � , x , y ∈ Q , ∇ f ( x ) ∈ ∂ f ( x ). Optimality condition: point x ∗ ∈ Q is optimal iff �∇ f ( x ∗ ) , x − x ∗ � ≥ 0 , ∀ x ∈ Q . Interpretation: Function increases along any feasible direction. Yu. Nesterov Primal-dual method for functional constraints 3/20

  4. Examples Let x ∗ ∈ int Q . 1. Interior solution. Then �∇ f ( x ∗ ) , x − x ∗ � ≥ 0, ∀ x ∈ Q implies ∇ f ( x ∗ ) = 0. 2. Optimization over positive orthant. � � x ∈ R n : x ( i ) ≥ 0 , i = 1 . . . . , n Let Q ≡ R n + = . ∀ x ∈ R n Optimality condition: �∇ f ( x ∗ ) , x − x ∗ � ≥ 0 , + . � � x ( i ) − x ( i ) ∀ x ( i ) ≥ 0. Coordinate form: ∇ i f ( x ∗ ) ≥ 0 , ∗ This means that (tend x ( i ) → ∞ ) ∇ i f ( x ∗ ) ≥ 0 , i = 1 , . . . , n , x ( i ) (set x ( i ) = 0.) ∗ ∇ i f ( x ∗ ) = 0 , i = 1 , . . . , n , Yu. Nesterov Primal-dual method for functional constraints 4/20

  5. Optimization problem: functional constraints x ∈ Q { f 0 ( x ) , f i ( x ) ≤ 0 , i = 1 , . . . , m } , Problem: min where Q is a closed convex set, all f i are convex and subdifferentiable on Q , i = 0 , . . . , m : f i ( y ) ≥ f i ( x ) + �∇ f i ( x ) , y − x � , x , y ∈ Q , ∇ f i ( x ) ∈ ∂ f i ( x ). Optimality condition (KKT, 1951): point x ∗ ∈ Q is optimal iff there exist Lagrange multipliers λ ( i ) ∗ ≥ 0, i = 1 , . . . , m , such that � m λ ( i ) (1) : �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0 , ∀ x ∈ Q , i =1 (2) : f i ( x ∗ ) ≤ 0 , i = 1 , . . . , m , (feasibility) λ ( i ) (3) : ∗ f i ( x ∗ ) = 0 , i = 1 , . . . , m . (complementary slackness) Yu. Nesterov Primal-dual method for functional constraints 5/20

  6. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : min x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . Observation: in any case, x ∗ is the optimal solution of problem P I . Interpretation: λ ( i ) are the shadow prices for resources. ∗ (Kantorovich, 1939) Application examples: Traffic congestion: car flows on roads ⇔ size of queues. Electrical networks: currents in the wires ⇔ voltage potentials, etc. Main question: How to compute ( x ∗ , λ ∗ )? Yu. Nesterov Primal-dual method for functional constraints 6/20

  7. Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 � m λ ( i ) Condition KKT(1): �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0, i =1 ∀ x ∈ Q , implies x ∗ ∈ Arg min x ∈ Q L ( x , λ ∗ ). Define the dual function φ ( λ ) = min x ∈ Q L ( x , λ ), λ ≥ 0. It is concave! By Danskin’s Theorem, ∇ φ ( λ ) = ( f 1 ( x ( λ )) , . . . , f m ( x ( λ )), with x ( λ ) ∈ Arg min x ∈ Q L ( x , λ ). Conditions KKT(2,3): f i ( x ∗ ) ≤ 0, λ ( i ) ∗ f i ( x ∗ ) = 0, i = 1 , . . . , m , imply ( x ∗ = x ( λ ∗ )) λ ∗ ∈ Arg max λ ≥ 0 φ ( λ ). Yu. Nesterov Primal-dual method for functional constraints 7/20

  8. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). 2 . Update λ k +1 = Project R n + ( λ k + h k ∇ φ ( λ k )). Stepsizes h k > 0 are defined in the usual way. Main difficulties: Each iteration is time consuming. Unclear termination criterion. � 1 � Low rate of convergence ( O upper-level iterations). ǫ 2 Yu. Nesterov Primal-dual method for functional constraints 8/20

  9. Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Define the Augmented Lagrangian � � 2 � m λ ( i ) + Kf i ( x ) � 1 2 K � λ � 2 1 λ ∈ R m , L K ( x , λ ) = f 0 ( x ) + + − 2 , 2 K i =1 where K > 0 is a penalty parameter. Consider the dual function ˆ � φ ( λ ) = min L ( x , λ ). x ∈ Q Main properties. Function ˆ φ is concave. Its gradient is Lipschitz continuous with constant 1 K . Its unconstrained maximum is attained at the optimal dual solution. The corresponding point ˆ x ( λ ∗ ) is the optimal primal solution. � � λ ( i ) + Kf i ( x ) + = λ ( i ) Hint: Check that the equation is equivalent to KKT(2,3). Yu. Nesterov Primal-dual method for functional constraints 9/20

  10. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Method: λ k +1 = ( λ k + Kf (ˆ x ( λ k ))) + . Advantage: Fast convergence of the dual process. Disadvantages: Difficult iteration. Unclear termination. No global complexity analysis. Do we have an alternative? Yu. Nesterov Primal-dual method for functional constraints 10/20

  11. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Q ⊂ E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q .) Defining the Lagrangian � m λ ( i ) f i ( x ) , x ∈ Q , λ ∈ R m L ( x , λ ) = f 0 ( x ) + + , i =1 def we can introduce the Lagrangian dual problem f ∗ = sup φ ( λ ), λ ∈ R m + where φ ( λ ) def = inf x ∈ Q L ( x , λ ). Clearly, f ∗ ≥ f ∗ . Later, we will show f ∗ = f ∗ algorithmically . Yu. Nesterov Primal-dual method for functional constraints 11/20

  12. Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Denote by x 0 the prox-center of the set Q : x 0 = arg min x ∈ Q d ( x ). Assume d ( x 0 ) = 0. Bregman distance: β ( x , y ) = d ( y ) − d ( x ) − �∇ d ( x ) , y − x � , x , y ∈ Q . 2 � x − y � 2 for all x , y ∈ Q . Clearly, β ( x , y ) ≥ 1 Bregman mapping: for x ∈ Q , g ∈ E ∗ and h > 0 define B h ( x , g ) = arg min y ∈ Q { h � g , y − x � + β ( x , y ) } . def The first-order condition for point x + = B h ( x , g ) is as follows: � hg + ∇ d ( x + ) − ∇ d ( x ) , y − x + � ≥ 0 , y ∈ Q . Yu. Nesterov Primal-dual method for functional constraints 12/20

  13. Examples � n � 1 / 2 � ( x ( i ) ) 2 We choose � x � = 1. Euclidean distance. and i =1 d ( x ) = 1 2 � x � 2 . Then β ( x , y ) = 1 2 � x − y � 2 , and we have B h ( x , g ) = Projection Q ( x − hg ). � n | x ( i ) | and 2. Entropy distance. We choose � x � = i =1 � n x ( i ) ln x ( i ) . d ( x ) = ln n + Then i =1 � n y ( i ) [ln y ( i ) − ln x ( i ) ]. β ( x , y ) = i =1 � n x ( i ) = 1 } , then If Q = { x ∈ R n + : i =1 � � � n B ( i ) h ( x , g ) = x ( i ) e − hg ( i ) / x ( j ) e − hg ( j ) , i = 1 , . . . , n . j =1 Yu. Nesterov Primal-dual method for functional constraints 13/20

  14. Switching subgradient method Input parameter: the step size h > 0. Initialization : Compute the prox-center x 0 . Iteration k ≥ 0 : a) Define I k = { i ∈ { 1 , . . . , m } : f i ( x k ) > h �∇ f i ( x k ) � ∗ } . � � ∇ f 0 ( x k ) b) If I k = ∅ , then compute x k +1 = B h x k , . �∇ f 0 ( x k ) � ∗ c) If I k � = ∅ , then choose arbitrary i k ∈ I k and define f ik ( x k ) h k = ∗ . Compute x k +1 = B h k ( x k , ∇ f i k ( x k )). �∇ f ik ( x k ) � 2 After t ≥ 0 iterations, define F t = { k ∈ { 0 , . . . , t } : I k = ∅} . Denote N ( t ) = |F ( t ) | . It is possible that N ( t ) = 0. Yu. Nesterov Primal-dual method for functional constraints 14/20

  15. Finding the dual multipliers if N ( t ) > 0, define the dual multipliers as follows: = h � λ (0) 1 �∇ f 0 ( x k ) � ∗ , t k ∈F t � λ ( i ) 1 = h k , i = 1 , . . . , m , t λ (0) t k ∈A i ( t ) where A i ( t ) = { k ∈ { 0 , . . . , t } : i k = i } , 0 ≤ i ≤ m . Denote S t = � 1 �∇ f 0 ( x k ) � ∗ . If F t = ∅ , then we define S t = 0. k ∈F t For proving convergence of the switching strategy, we find an upper bound for the gap � f 0 ( x k ) δ t = 1 �∇ f 0 ( x k ) � ∗ − φ ( λ t ), S t k ∈F ( t ) assuming that N ( t ) > 0. Yu. Nesterov Primal-dual method for functional constraints 15/20

Recommend


More recommend