limited memory kelley s method converges for composite
play

Limited memory Kelleys Method Converges for Composite Convex and - PowerPoint PPT Presentation

Limited memory Kelleys Method Converges for Composite Convex and Submodular Objectives Madeleine Udell Operations Research and Information Engineering Cornell University Song Zhou (Cornell), Swati Gupta (Georgia Tech) NeurIPS, December 2018


  1. Limited memory Kelley’s Method Converges for Composite Convex and Submodular Objectives Madeleine Udell Operations Research and Information Engineering Cornell University Song Zhou (Cornell), Swati Gupta (Georgia Tech) NeurIPS, December 2018 1 / 11

  2. Problem to solve minimize g ( x ) + f ( x ) ◮ g : R n → R strongly convex ◮ f : R n → R Lov´ asz extension of submodular function F ◮ piecewise linear ◮ convex envelope of F ◮ generically, exponentially many linear pieces L-KM solves composite convex + submodular problems whose natural size is exponential with linear memory . 2 / 11

  3. Submodular optimization background ◮ Ground set V = { 1 , n } . ◮ F : 2 V → R is submodular if for all A , B ⊆ V , F ( A ) + F ( B ) ≥ F ( A ∪ B ) + F ( A ∩ B ) ◮ the base polytope of F is B ( F ) = { w ∈ R n : w ( V ) = F ( V ) , w ( A ) ≤ F ( A ) , ∀ A ⊆ V } ◮ the Lov´ asz extension of F is the homogeneous piecewise linear convex function w ∈ B ( F ) w ⊤ x f ( x ) = max ◮ linear optimization over B ( F ) is easy ◮ = ⇒ evaluating f ( x ) and ∂ f ( x ) is easy 3 / 11

  4. Original Simplicial Method ( OSM ) [Bach 2013] Intuition : ◮ approximate f with pwl function whose values and (sub)gradients match f at all previous iterates ◮ minimize approximation to determine the next iterate Advantages : Finite convergence [Bach 2013] Drawbacks : ◮ Memory. memory |V ( i ) | = i grows with iteration counter i ◮ Computation. subproblem size grows with memory ◮ Convergence rate. no known rate of convergence [Bach 2013] 4 / 11

  5. Limited Memory Kelley’s Method ( L-KM ) Algorithm 1 L-KM (to minimize g ( x ) + f ( x )) initialize V � = ∅ affinely independent. repeat 1. define ˆ f ( x ) = max w ∈V w ⊤ x 2. solve subproblem x ← argmin g ( x ) + ˆ ˆ f ( x ) x ⊤ w 3. compute v ∈ ∂ f (ˆ x ) = argmax w ∈ B ( F ) ˆ 4. V ← { w ∈ V : w ⊤ x = f (ˆ x ) } ∪ v unlike OSM , L-KM drops subgradients w ∈ V that are not tight at current iterate 5 / 11

  6. L-KM : example g + f (1) g + f (1) x (1) z (0) 6 / 11

  7. L-KM : example g + f (2) g + f (2) z (1) x (2) 6 / 11

  8. L-KM : example g + f (3) g + f (3) z (2) x (3) 6 / 11

  9. L-KM : example 6 / 11

  10. Properties of L-KM ◮ Limited memory: In L-KM , for all i ≥ 0, vectors in V ( i ) are affinely independent. Moreover, |V ( i ) | ≤ n + 1. ◮ Finite convergence: When g is strongly convex, L-KM converges finitely. ◮ Linear convergence: When g is smooth and strongly convex, the duality gap of L-KM and OSM converges linearly to 0. 7 / 11

  11. Limited-memory Fully Corrective Frank Wolfe L-FCFW Algorithm 2 L-FCFW (to minimize − g ∗ ( − y ) over y ∈ B ( F )) initialize V � = ∅ affinely independent. repeat 1. solve subproblem − g ∗ ( − y ) minimize subject to y ∈ conv ( V ) do convex decomposition of the solution ˆ y = � w ∈V λ w w with λ w ≥ 0 and � w ∈V λ w = 1 x = ∇ ( − g ∗ ( − ˆ 2. compute gradient ˆ y )) x ⊤ w 3. solve linear optimization v = argmax w ∈ B ( F ) ˆ 4. V ← { w ∈ V : λ w > 0 } ∪ v 8 / 11

  12. Fully corrective Frank-Wolfe v 1 v 1 w (0) − g ∗ ( − w ) minimize subject to w ∈ B ( F ) w (1) v 5 v 5 v 5 v 2 −∇ g ( w (2) ) −∇ g ( w (1) ) −∇ g ( w (0) ) w (2) w (2) v 3 v 3 v 3 v 4 v 4 v 4 9 / 11

  13. Fully corrective Frank-Wolfe v 1 v 1 w (0) − g ∗ ( − w ) minimize subject to w ∈ B ( F ) w (1) v 5 v 5 v 5 v 2 −∇ g ( w (2) ) −∇ g ( w (1) ) −∇ g ( w (0) ) w (2) w (2) v 3 v 3 v 3 v 4 v 4 v 4 9 / 11

  14. Fully corrective Frank-Wolfe v 1 v 1 w (0) − g ∗ ( − w ) minimize subject to w ∈ B ( F ) w (1) v 5 v 5 v 5 v 2 −∇ g ( w (2) ) −∇ g ( w (1) ) −∇ g ( w (0) ) w (2) w (2) v 3 v 3 v 3 v 4 v 4 v 4 9 / 11

  15. Fully corrective Frank-Wolfe v 1 v 1 w (0) − g ∗ ( − w ) minimize subject to w ∈ B ( F ) w (1) v 5 v 5 v 5 v 2 −∇ g ( w (2) ) −∇ g ( w (1) ) −∇ g ( w (0) ) w (2) w (2) v 3 v 3 v 3 v 4 v 4 v 4 9 / 11

  16. Fully corrective Frank-Wolfe v 1 v 1 w (0) − g ∗ ( − w ) minimize subject to w ∈ B ( F ) w (1) v 5 v 5 v 5 v 2 −∇ g ( w (2) ) −∇ g ( w (1) ) −∇ g ( w (0) ) w (2) w (2) v 3 v 3 v 3 v 4 v 4 v 4 9 / 11

  17. Fully corrective Frank-Wolfe v 1 v 1 w (0) − g ∗ ( − w ) minimize subject to w ∈ B ( F ) w (1) v 5 v 5 v 5 v 2 −∇ g ( w (2) ) −∇ g ( w (1) ) −∇ g ( w (0) ) w (2) w (2) v 3 v 3 v 3 v 4 v 4 v 4 9 / 11

  18. Fully corrective Frank-Wolfe v 1 v 1 w (0) − g ∗ ( − w ) minimize subject to w ∈ B ( F ) w (1) v 5 v 5 v 5 v 2 −∇ g ( w (2) ) −∇ g ( w (1) ) −∇ g ( w (0) ) w (2) w (2) v 3 v 3 v 3 v 4 v 4 v 4 9 / 11

  19. Fully corrective Frank-Wolfe v 1 v 1 w (0) − g ∗ ( − w ) minimize subject to w ∈ B ( F ) w (1) v 5 v 5 v 5 v 2 −∇ g ( w (2) ) −∇ g ( w (1) ) −∇ g ( w (0) ) w (2) w (2) v 3 v 3 v 3 v 4 v 4 v 4 9 / 11

  20. Fully corrective Frank-Wolfe v 1 v 1 w (0) − g ∗ ( − w ) minimize subject to w ∈ B ( F ) w (1) v 5 v 5 v 5 v 2 −∇ g ( w (2) ) −∇ g ( w (1) ) −∇ g ( w (0) ) w (2) w (2) v 3 v 3 v 3 v 4 v 4 v 4 9 / 11

  21. Fully corrective Frank-Wolfe v 1 v 1 w (0) − g ∗ ( − w ) minimize subject to w ∈ B ( F ) w (1) v 5 v 5 v 5 v 2 −∇ g ( w (2) ) −∇ g ( w (1) ) −∇ g ( w (0) ) w (2) w (2) v 3 v 3 v 3 v 4 v 4 v 4 9 / 11

  22. Properties of L-FCFW ◮ Limited memory : By Carath´ eodory’s theorem, we can choose ≤ n + 1 active vertices to represent the current iterate. ◮ Linear Convergence [Lacoste-Julien and Jaggi, 2015]: When g is smooth and strongly convex, the duality gap of L-FCFW converges linearly to 0. ◮ Duality : Two algorithms are dual if their iterates solve dual subproblems. If g is smooth and strongly convex and ◮ B ( i ) = { w ∈ V ( i − 1) : λ w > 0 } , L-FCFW is dual to L-KM . ◮ B ( i ) = V ( i − 1) , L-FCFW is dual to OSM . 10 / 11

  23. Summary L-KM solves composite convex + submodular problems whose natural size is exponential with linear memory . ◮ S. Zhou, S. Gupta, and M. Udell. Limited Memory Kelley’s Method Converges for Composite Convex and Submodular Objectives. NIPS 2018. ◮ 5–7pm Room 210 Poster #16 11 / 11

Recommend


More recommend