new primal dual subgradient methods for convex problems
play

New primal-dual subgradient methods for Convex Problems with - PowerPoint PPT Presentation

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) January 12, 2015 (Les Houches) Yu. Nesterov New primal-dual methods for functional constraints 1/19 Outline 1 Constrained


  1. Lagrange multipliers: interpretation Yu. Nesterov New primal-dual methods for functional constraints 5/19

  2. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Yu. Nesterov New primal-dual methods for functional constraints 5/19

  3. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). i ∈I Yu. Nesterov New primal-dual methods for functional constraints 5/19

  4. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I Yu. Nesterov New primal-dual methods for functional constraints 5/19

  5. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : min x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . Yu. Nesterov New primal-dual methods for functional constraints 5/19

  6. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : min x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . Observation: in any case, x ∗ is the optimal solution of problem P I . Yu. Nesterov New primal-dual methods for functional constraints 5/19

  7. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . min Observation: in any case, x ∗ is the optimal solution of problem P I . Interpretation: λ ( i ) are the shadow prices for resources. ∗ Yu. Nesterov New primal-dual methods for functional constraints 5/19

  8. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : min x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . Observation: in any case, x ∗ is the optimal solution of problem P I . Interpretation: λ ( i ) are the shadow prices for resources. ∗ (Kantorovich, 1939) Yu. Nesterov New primal-dual methods for functional constraints 5/19

  9. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : min x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . Observation: in any case, x ∗ is the optimal solution of problem P I . Interpretation: λ ( i ) are the shadow prices for resources. ∗ (Kantorovich, 1939) Application examples: Yu. Nesterov New primal-dual methods for functional constraints 5/19

  10. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : min x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . Observation: in any case, x ∗ is the optimal solution of problem P I . Interpretation: λ ( i ) are the shadow prices for resources. ∗ (Kantorovich, 1939) Application examples: Traffic congestion: car flows on roads ⇔ size of queues. Yu. Nesterov New primal-dual methods for functional constraints 5/19

  11. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : min x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . Observation: in any case, x ∗ is the optimal solution of problem P I . Interpretation: λ ( i ) are the shadow prices for resources. ∗ (Kantorovich, 1939) Application examples: Traffic congestion: car flows on roads ⇔ size of queues. Electrical networks: currents in the wires ⇔ voltage potentials, etc. Yu. Nesterov New primal-dual methods for functional constraints 5/19

  12. Lagrange multipliers: interpretation Let I ⊆ { 1 , . . . , m } be an arbitrary set of indexes. Denote f I ( x ) = f 0 ( x ) + � λ ( i ) ∗ f i ( x ). Consider the problem i ∈I P I : min x ∈ Q { f I ( x ) : f i ( x ) ≤ 0 , i �∈ I} . Observation: in any case, x ∗ is the optimal solution of problem P I . Interpretation: λ ( i ) are the shadow prices for resources. ∗ (Kantorovich, 1939) Application examples: Traffic congestion: car flows on roads ⇔ size of queues. Electrical networks: currents in the wires ⇔ voltage potentials, etc. Main question: How to compute ( x ∗ , λ ∗ )? Yu. Nesterov New primal-dual methods for functional constraints 5/19

  13. Algebraic interpretation Yu. Nesterov New primal-dual methods for functional constraints 6/19

  14. Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 Yu. Nesterov New primal-dual methods for functional constraints 6/19

  15. Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 m � λ ( i ) Condition KKT(1): �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0, i =1 ∀ x ∈ Q , Yu. Nesterov New primal-dual methods for functional constraints 6/19

  16. Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 m � λ ( i ) Condition KKT(1): �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0, i =1 ∀ x ∈ Q , implies x ∗ ∈ Arg min x ∈ Q L ( x , λ ∗ ). Yu. Nesterov New primal-dual methods for functional constraints 6/19

  17. Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 m � λ ( i ) Condition KKT(1): �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0, i =1 ∀ x ∈ Q , implies x ∗ ∈ Arg min x ∈ Q L ( x , λ ∗ ). Define the dual function φ ( λ ) = min x ∈ Q L ( x , λ ), λ ≥ 0. Yu. Nesterov New primal-dual methods for functional constraints 6/19

  18. Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 m � λ ( i ) Condition KKT(1): �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0, i =1 ∀ x ∈ Q , implies x ∗ ∈ Arg min x ∈ Q L ( x , λ ∗ ). Define the dual function φ ( λ ) = min x ∈ Q L ( x , λ ), λ ≥ 0. It is concave! Yu. Nesterov New primal-dual methods for functional constraints 6/19

  19. Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 m � λ ( i ) Condition KKT(1): �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0, i =1 ∀ x ∈ Q , implies x ∗ ∈ Arg min x ∈ Q L ( x , λ ∗ ). Define the dual function φ ( λ ) = min x ∈ Q L ( x , λ ), λ ≥ 0. It is concave! By Danskin’s Theorem, ∇ φ ( λ ) = ( f 1 ( x ( λ )) , . . . , f m ( x ( λ )), with x ( λ ) ∈ Arg min x ∈ Q L ( x , λ ). Yu. Nesterov New primal-dual methods for functional constraints 6/19

  20. Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 m � λ ( i ) Condition KKT(1): �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0, i =1 ∀ x ∈ Q , implies x ∗ ∈ Arg min x ∈ Q L ( x , λ ∗ ). Define the dual function φ ( λ ) = min x ∈ Q L ( x , λ ), λ ≥ 0. It is concave! By Danskin’s Theorem, ∇ φ ( λ ) = ( f 1 ( x ( λ )) , . . . , f m ( x ( λ )), with x ( λ ) ∈ Arg min x ∈ Q L ( x , λ ). Conditions KKT(2,3): f i ( x ∗ ) ≤ 0, λ ( i ) ∗ f i ( x ∗ ) = 0, i = 1 , . . . , m , Yu. Nesterov New primal-dual methods for functional constraints 6/19

  21. Algebraic interpretation � m λ ( i ) f i ( x ). Consider the Lagrangian L ( x , λ ) = f 0 ( x ) + i =1 m � λ ( i ) Condition KKT(1): �∇ f 0 ( x ∗ ) + ∗ ∇ f i ( x ∗ ) , x − x ∗ � ≥ 0, i =1 ∀ x ∈ Q , implies x ∗ ∈ Arg min x ∈ Q L ( x , λ ∗ ). Define the dual function φ ( λ ) = min x ∈ Q L ( x , λ ), λ ≥ 0. It is concave! By Danskin’s Theorem, ∇ φ ( λ ) = ( f 1 ( x ( λ )) , . . . , f m ( x ( λ )), with x ( λ ) ∈ Arg min x ∈ Q L ( x , λ ). Conditions KKT(2,3): f i ( x ∗ ) ≤ 0, λ ( i ) ∗ f i ( x ∗ ) = 0, i = 1 , . . . , m , imply ( x ∗ = x ( λ ∗ )) λ ∗ ∈ Arg max λ ≥ 0 φ ( λ ). Yu. Nesterov New primal-dual methods for functional constraints 6/19

  22. Algorithmic aspects Yu. Nesterov New primal-dual methods for functional constraints 7/19

  23. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) Yu. Nesterov New primal-dual methods for functional constraints 7/19

  24. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : Yu. Nesterov New primal-dual methods for functional constraints 7/19

  25. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). Yu. Nesterov New primal-dual methods for functional constraints 7/19

  26. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). 2 . Update λ k +1 = Project R n + ( λ k + h k ∇ φ ( λ k )). Yu. Nesterov New primal-dual methods for functional constraints 7/19

  27. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). 2 . Update λ k +1 = Project R n + ( λ k + h k ∇ φ ( λ k )). Stepsizes h k > 0 are defined in the usual way. Yu. Nesterov New primal-dual methods for functional constraints 7/19

  28. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). 2 . Update λ k +1 = Project R n + ( λ k + h k ∇ φ ( λ k )). Stepsizes h k > 0 are defined in the usual way. Main difficulties: Yu. Nesterov New primal-dual methods for functional constraints 7/19

  29. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). 2 . Update λ k +1 = Project R n + ( λ k + h k ∇ φ ( λ k )). Stepsizes h k > 0 are defined in the usual way. Main difficulties: Each iteration is time consuming. Yu. Nesterov New primal-dual methods for functional constraints 7/19

  30. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). 2 . Update λ k +1 = Project R n + ( λ k + h k ∇ φ ( λ k )). Stepsizes h k > 0 are defined in the usual way. Main difficulties: Each iteration is time consuming. Unclear termination criterion. Yu. Nesterov New primal-dual methods for functional constraints 7/19

  31. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). 2 . Update λ k +1 = Project R n + ( λ k + h k ∇ φ ( λ k )). Stepsizes h k > 0 are defined in the usual way. Main difficulties: Each iteration is time consuming. Unclear termination criterion. Low rate of convergence Yu. Nesterov New primal-dual methods for functional constraints 7/19

  32. Algorithmic aspects Main idea: solve the dual problem max λ ≥ 0 φ ( λ ) by the subgradient method : 1 . Compute x ( λ k ) and define ∇ φ ( λ k ) = ( f 1 ( x ( λ k )) , . . . , f m ( x ( λ k ))). 2 . Update λ k +1 = Project R n + ( λ k + h k ∇ φ ( λ k )). Stepsizes h k > 0 are defined in the usual way. Main difficulties: Each iteration is time consuming. Unclear termination criterion. � 1 � Low rate of convergence ( O upper-level iterations). ǫ 2 Yu. Nesterov New primal-dual methods for functional constraints 7/19

  33. Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Yu. Nesterov New primal-dual methods for functional constraints 8/19

  34. Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Define the Augmented Lagrangian m � � 2 � λ ( i ) + Kf i ( x ) � 1 2 K � λ � 2 1 λ ∈ R m , L K ( x , λ ) = f 0 ( x ) + + − 2 , 2 K i =1 where K > 0 is a penalty parameter. Yu. Nesterov New primal-dual methods for functional constraints 8/19

  35. Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Define the Augmented Lagrangian m � � 2 � λ ( i ) + Kf i ( x ) � 1 2 K � λ � 2 1 λ ∈ R m , L K ( x , λ ) = f 0 ( x ) + + − 2 , 2 K i =1 where K > 0 is a penalty parameter. Consider the dual function ˆ � φ ( λ ) = min L ( x , λ ). x ∈ Q Yu. Nesterov New primal-dual methods for functional constraints 8/19

  36. Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Define the Augmented Lagrangian m � � 2 � λ ( i ) + Kf i ( x ) � 1 2 K � λ � 2 1 λ ∈ R m , L K ( x , λ ) = f 0 ( x ) + + − 2 , 2 K i =1 where K > 0 is a penalty parameter. Consider the dual function ˆ � φ ( λ ) = min L ( x , λ ). x ∈ Q Main properties. Function ˆ φ is concave. Yu. Nesterov New primal-dual methods for functional constraints 8/19

  37. Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Define the Augmented Lagrangian m � � 2 � λ ( i ) + Kf i ( x ) � 1 2 K � λ � 2 1 λ ∈ R m , L K ( x , λ ) = f 0 ( x ) + + − 2 , 2 K i =1 where K > 0 is a penalty parameter. Consider the dual function ˆ � φ ( λ ) = min L ( x , λ ). x ∈ Q Main properties. Function ˆ φ is concave. Its gradient is Lipschitz continuous with constant 1 K . Yu. Nesterov New primal-dual methods for functional constraints 8/19

  38. Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Define the Augmented Lagrangian m � � 2 � λ ( i ) + Kf i ( x ) � 1 2 K � λ � 2 1 λ ∈ R m , L K ( x , λ ) = f 0 ( x ) + + − 2 , 2 K i =1 where K > 0 is a penalty parameter. Consider the dual function ˆ � φ ( λ ) = min L ( x , λ ). x ∈ Q Main properties. Function ˆ φ is concave. Its gradient is Lipschitz continuous with constant 1 K . Its unconstrained maximum is attained at the optimal dual solution. Yu. Nesterov New primal-dual methods for functional constraints 8/19

  39. Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Define the Augmented Lagrangian m � � 2 � λ ( i ) + Kf i ( x ) � 1 2 K � λ � 2 1 λ ∈ R m , L K ( x , λ ) = f 0 ( x ) + + − 2 , 2 K i =1 where K > 0 is a penalty parameter. Consider the dual function ˆ � φ ( λ ) = min L ( x , λ ). x ∈ Q Main properties. Function ˆ φ is concave. Its gradient is Lipschitz continuous with constant 1 K . Its unconstrained maximum is attained at the optimal dual solution. The corresponding point ˆ x ( λ ∗ ) is the optimal primal solution. Yu. Nesterov New primal-dual methods for functional constraints 8/19

  40. Augmented Lagrangian (1970’s) [Hestenes, Powell, Rockafellar, Polyak, Bertsekas, . . . ] Define the Augmented Lagrangian m � � 2 � λ ( i ) + Kf i ( x ) � 1 2 K � λ � 2 1 λ ∈ R m , L K ( x , λ ) = f 0 ( x ) + + − 2 , 2 K i =1 where K > 0 is a penalty parameter. Consider the dual function ˆ � φ ( λ ) = min L ( x , λ ). x ∈ Q Main properties. Function ˆ φ is concave. Its gradient is Lipschitz continuous with constant 1 K . Its unconstrained maximum is attained at the optimal dual solution. The corresponding point ˆ x ( λ ∗ ) is the optimal primal solution. � � λ ( i ) + Kf i ( x ) + = λ ( i ) Hint: Check that the equation is equivalent to KKT(2,3). Yu. Nesterov New primal-dual methods for functional constraints 8/19

  41. Method of Augmented Lagrangians Yu. Nesterov New primal-dual methods for functional constraints 9/19

  42. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Yu. Nesterov New primal-dual methods for functional constraints 9/19

  43. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Yu. Nesterov New primal-dual methods for functional constraints 9/19

  44. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Method: λ k +1 = ( λ k + Kf (ˆ x ( λ k ))) + . Yu. Nesterov New primal-dual methods for functional constraints 9/19

  45. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Method: λ k +1 = ( λ k + Kf (ˆ x ( λ k ))) + . Advantage: Fast convergence of the dual process. Yu. Nesterov New primal-dual methods for functional constraints 9/19

  46. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Method: λ k +1 = ( λ k + Kf (ˆ x ( λ k ))) + . Advantage: Fast convergence of the dual process. Disadvantages: Yu. Nesterov New primal-dual methods for functional constraints 9/19

  47. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Method: λ k +1 = ( λ k + Kf (ˆ x ( λ k ))) + . Advantage: Fast convergence of the dual process. Disadvantages: Difficult iteration. Yu. Nesterov New primal-dual methods for functional constraints 9/19

  48. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Method: λ k +1 = ( λ k + Kf (ˆ x ( λ k ))) + . Advantage: Fast convergence of the dual process. Disadvantages: Difficult iteration. Unclear termination. Yu. Nesterov New primal-dual methods for functional constraints 9/19

  49. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Method: λ k +1 = ( λ k + Kf (ˆ x ( λ k ))) + . Advantage: Fast convergence of the dual process. Disadvantages: Difficult iteration. Unclear termination. No global complexity analysis. Yu. Nesterov New primal-dual methods for functional constraints 9/19

  50. Method of Augmented Lagrangians � � λ ( i ) + Kf i ( x ) Note that ∇ ˆ φ ( λ ) = 1 + − 1 K λ . K Therefore, the usual gradient method λ k +1 = λ k + K ∇ ˆ φ ( λ k ) is exactly as follows: Method: λ k +1 = ( λ k + Kf (ˆ x ( λ k ))) + . Advantage: Fast convergence of the dual process. Disadvantages: Difficult iteration. Unclear termination. No global complexity analysis. Do we have an alternative? Yu. Nesterov New primal-dual methods for functional constraints 9/19

  51. Problem formulation Yu. Nesterov New primal-dual methods for functional constraints 10/19

  52. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , Yu. Nesterov New primal-dual methods for functional constraints 10/19

  53. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Yu. Nesterov New primal-dual methods for functional constraints 10/19

  54. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Q ⊂ E is a bounded simple closed convex set. Yu. Nesterov New primal-dual methods for functional constraints 10/19

  55. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Q ⊂ E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q .) Yu. Nesterov New primal-dual methods for functional constraints 10/19

  56. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Q ⊂ E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q .) Defining the Lagrangian � m λ ( i ) f i ( x ) , x ∈ Q , λ ∈ R m L ( x , λ ) = f 0 ( x ) + + , i =1 Yu. Nesterov New primal-dual methods for functional constraints 10/19

  57. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Q ⊂ E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q .) Defining the Lagrangian � m λ ( i ) f i ( x ) , x ∈ Q , λ ∈ R m L ( x , λ ) = f 0 ( x ) + + , i =1 def we can introduce the Lagrangian dual problem f ∗ = sup φ ( λ ), λ ∈ R m + Yu. Nesterov New primal-dual methods for functional constraints 10/19

  58. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Q ⊂ E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q .) Defining the Lagrangian � m λ ( i ) f i ( x ) , x ∈ Q , λ ∈ R m L ( x , λ ) = f 0 ( x ) + + , i =1 def we can introduce the Lagrangian dual problem f ∗ = sup φ ( λ ), λ ∈ R m + where φ ( λ ) def = inf x ∈ Q L ( x , λ ). Yu. Nesterov New primal-dual methods for functional constraints 10/19

  59. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Q ⊂ E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q .) Defining the Lagrangian � m λ ( i ) f i ( x ) , x ∈ Q , λ ∈ R m L ( x , λ ) = f 0 ( x ) + + , i =1 def we can introduce the Lagrangian dual problem f ∗ = sup φ ( λ ), λ ∈ R m + where φ ( λ ) def = inf x ∈ Q L ( x , λ ). Clearly, f ∗ ≥ f ∗ . Yu. Nesterov New primal-dual methods for functional constraints 10/19

  60. Problem formulation f ∗ = inf Problem: x ∈ Q { f 0 ( x ) : f i ( x ) ≤ 0 , i = 1 , . . . , m } , where f i ( x ), i = 0 , . . . , m , are closed convex functions on Q endowed with a first-order black-box oracles, Q ⊂ E is a bounded simple closed convex set. (We can solve some auxiliary optimization problems over Q .) Defining the Lagrangian � m λ ( i ) f i ( x ) , x ∈ Q , λ ∈ R m L ( x , λ ) = f 0 ( x ) + + , i =1 def we can introduce the Lagrangian dual problem f ∗ = sup φ ( λ ), λ ∈ R m + where φ ( λ ) def = inf x ∈ Q L ( x , λ ). Clearly, f ∗ ≥ f ∗ . Later, we will show f ∗ = f ∗ algorithmically . Yu. Nesterov New primal-dual methods for functional constraints 10/19

  61. Bregman distances Yu. Nesterov New primal-dual methods for functional constraints 11/19

  62. Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Yu. Nesterov New primal-dual methods for functional constraints 11/19

  63. Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Denote by x 0 the prox-center of the set Q : x 0 = arg min x ∈ Q d ( x ). Yu. Nesterov New primal-dual methods for functional constraints 11/19

  64. Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Denote by x 0 the prox-center of the set Q : x 0 = arg min x ∈ Q d ( x ). Assume d ( x 0 ) = 0. Yu. Nesterov New primal-dual methods for functional constraints 11/19

  65. Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Denote by x 0 the prox-center of the set Q : x 0 = arg min x ∈ Q d ( x ). Assume d ( x 0 ) = 0. Bregman distance: β ( x , y ) = d ( y ) − d ( x ) − �∇ d ( x ) , y − x � , x , y ∈ Q . Yu. Nesterov New primal-dual methods for functional constraints 11/19

  66. Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Denote by x 0 the prox-center of the set Q : x 0 = arg min x ∈ Q d ( x ). Assume d ( x 0 ) = 0. Bregman distance: β ( x , y ) = d ( y ) − d ( x ) − �∇ d ( x ) , y − x � , x , y ∈ Q . 2 � x − y � 2 for all x , y ∈ Q . Clearly, β ( x , y ) ≥ 1 Yu. Nesterov New primal-dual methods for functional constraints 11/19

  67. Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Denote by x 0 the prox-center of the set Q : x 0 = arg min x ∈ Q d ( x ). Assume d ( x 0 ) = 0. Bregman distance: β ( x , y ) = d ( y ) − d ( x ) − �∇ d ( x ) , y − x � , x , y ∈ Q . 2 � x − y � 2 for all x , y ∈ Q . Clearly, β ( x , y ) ≥ 1 Bregman mapping: for x ∈ Q , g ∈ E ∗ and h > 0 Yu. Nesterov New primal-dual methods for functional constraints 11/19

  68. Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Denote by x 0 the prox-center of the set Q : x 0 = arg min x ∈ Q d ( x ). Assume d ( x 0 ) = 0. Bregman distance: β ( x , y ) = d ( y ) − d ( x ) − �∇ d ( x ) , y − x � , x , y ∈ Q . 2 � x − y � 2 for all x , y ∈ Q . Clearly, β ( x , y ) ≥ 1 Bregman mapping: for x ∈ Q , g ∈ E ∗ and h > 0 define B h ( x , g ) = arg min y ∈ Q { h � g , y − x � + β ( x , y ) } . Yu. Nesterov New primal-dual methods for functional constraints 11/19

  69. Bregman distances Prox-function: d ( · ) is strongly convex on Q with parameter one: d ( y ) ≥ d ( x ) + �∇ d ( x ) , y − x � + 1 2 � y − x � 2 , x , y ∈ Q . Denote by x 0 the prox-center of the set Q : x 0 = arg min x ∈ Q d ( x ). Assume d ( x 0 ) = 0. Bregman distance: β ( x , y ) = d ( y ) − d ( x ) − �∇ d ( x ) , y − x � , x , y ∈ Q . 2 � x − y � 2 for all x , y ∈ Q . Clearly, β ( x , y ) ≥ 1 Bregman mapping: for x ∈ Q , g ∈ E ∗ and h > 0 define B h ( x , g ) = arg min y ∈ Q { h � g , y − x � + β ( x , y ) } . Examples: Euclidean distance, Entropy distance, etc. Yu. Nesterov New primal-dual methods for functional constraints 11/19

  70. Switching subgradient methods: Primal Method Yu. Nesterov New primal-dual methods for functional constraints 12/19

  71. Switching subgradient methods: Primal Method Input parameter: the step size h > 0. Yu. Nesterov New primal-dual methods for functional constraints 12/19

  72. Switching subgradient methods: Primal Method Input parameter: the step size h > 0. Initialization : Compute the prox-center x 0 . Yu. Nesterov New primal-dual methods for functional constraints 12/19

  73. Switching subgradient methods: Primal Method Input parameter: the step size h > 0. Initialization : Compute the prox-center x 0 . Iteration k ≥ 0 : Yu. Nesterov New primal-dual methods for functional constraints 12/19

Recommend


More recommend