accelerated douglas rachford splitting and admm for
play

Accelerated Douglas-Rachford splitting and ADMM for structured - PowerPoint PPT Presentation

Accelerated Douglas-Rachford splitting and ADMM for structured nonconvex optimization Panos Patrinos KU Leuven (ESAT-STADIUS) joint work with Andreas Themelis and Lorenzo Stella LCCC Workshop Large-Scale and Distributed Optimization Lund,


  1. Accelerated Douglas-Rachford splitting and ADMM for structured nonconvex optimization Panos Patrinos KU Leuven (ESAT-STADIUS) joint work with Andreas Themelis and Lorenzo Stella LCCC Workshop Large-Scale and Distributed Optimization Lund, Sweden June 14, 2017 A. Themelis, L. Stella and P. Patrinos Douglas–Rachford splitting and ADMM for nonconvex optimization: new convergence results and accelerated versions https://arxiv.org/abs/1709.05747

  2. Structured nonconvex optimization composite problem separable problem minimize ϕ 1 ( s ) + ϕ 2 ( s ) minimize f ( x ) + g ( z ) subject to Ax + Bz = b ◮ templates for large-scale structured optimization ◮ ϕ 1 , ϕ 2 , f , g can be nonsmooth ◮ numerous applications ◮ machine learning ◮ statistics ◮ signal/image processing, ◮ control. . . ◮ traditional algorithms usually do not apply 1 / 28

  3. Structured nonconvex optimization composite problem separable problem minimize ϕ 1 ( s ) + ϕ 2 ( s ) minimize f ( x ) + g ( z ) subject to Ax + Bz = b ◮ resurgence of proximal algorithms (or operator splitting methods) ◮ reduce complex problem into a series of simpler subproblems ◮ perhaps most popular proximal algorithms Douglas-Rachford Splitting (DRS) Alternating Direction Method of Multipliers (ADMM) ◮ elegant, complete theory for convex problems (monotone operators, fixed-point iterations, Fejér sequences. . . 1 ) 1Bauschke H.H. and Combettes P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer 2011 1 / 28

  4. Contribution composite problem separable problem minimize ϕ 1 ( s ) + ϕ 2 ( s ) minimize f ( x ) + g ( z ) subject to Ax + Bz = b DRS & ADMM ◮ being fixed point iterations, DRS & ADMM can be agonizingly slow ◮ nonconvex problems: incomplete theory, results empirical or local 1,2 ◮ global results have recently emerged (see next slides) this talk ◮ global convergence theory for nonconvex problems based on the Douglas-Rachford Envelope (DRE) ◮ more importantly, new, robust, faster algorithms 1R. Hesse and R. Luke Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems . SIAM Opt. 23(4) 2013 2F. Artacho, J. Borwein and M. Tam Recent Results on Douglas–Rachford Methods for Combinatorial Optimization Problems . JOTA 163(1) 2014 1 / 28

  5. Many applications... ◮ ADMM: amenable for distributed formulations (via consensus ) ◮ Nonconvex problems: no need for convex relaxation rank constraints, 0 /Schatten-norms, (mixed-) integer programming Some examples: ◮ hybrid system MPC 1 ◮ distributed sparse principal component analysis (SPCA) 2 ◮ dictionary learning 3 ◮ background-foreground extraction 4,5 ◮ sparse representations (signal processing) 6 1Takapoui R., Moehle N., Boyd S. and Bemporad A. A simple effective heuristic for embedded mixed-integer quadratic pro- gramming . IEEE ACC 2016 2Hajinezhad D. and Hong M. Nonconvex ADMM for distributed sparse principal component analysis . GlobalSIP 2015 3Wai H. T., Chang T. H. and Scaglione A. A consensus-based decentralized algorithm for non-convex optimization with appli- cation to dictionary learning . ICASSP 2015 4Chartrand R. Nonconvex splitting for regularized low-rank + sparse decomposition . IEEE TSP 2012 5Yang L., Pong T. K. and Chen X. ADMM for a class of nonconvex and nonsmooth problems with applications to background/- foreground extraction . SIAM 2017 6Chartrand R. and Wohlberg B. A nonconvex ADMM algorithm for group sparsity with sparse groups . ICASSP 2013 2 / 28

  6. DRS for nonconvex problems to solve ϕ 1 ( s ) + ϕ 2 ( s ) minimize R n , iterate starting from s ∈ I u = prox γϕ 1 ( s ) v ∈ prox γϕ 2 (2 u − s ) s + = s + λ ( v − u ) standing assumptions 1. ϕ 1 and ϕ 2 are prox-friendly , however both can be nonconvex 2. dom ϕ 1 is affine and ∇ ϕ 1 is Lipschitz on dom ϕ 1 2 γ � · � 2 is bounded below for some γ > 0 ( prox-bounded ) 3. ϕ 2 + 1 4. dom ϕ 2 ⊆ dom ϕ 1 3 / 28

  7. Structured Optimization Tools: proximal map Only proximal operations on ϕ 1 and ϕ 2 : � 2 γ � w − s � 2 � h ( w ) + 1 prox γh ( s ) = argmin , γ > 0 w ◮ a generalized projection : for h = δ C , prox γh = Π C Properties ◮ well defined for small γ ◮ Lipschitz for ϕ 1 (for small γ ), but set-valued for ϕ 2 ◮ “prox-friendly” (easily proximable) in many useful applications ◮ the value function is the Moreau envelope � 2 γ � w − s � 2 � h γ ( s ) := min h ( w ) + 1 w ◮ h γ is locally Lipschitz in general, even smooth for convex h 4 / 28

  8. Douglas-Rachford Envelope “Integrating” the fixed-point residual � u = prox γϕ 1 ( s ) minimize ϕ = ϕ 1 + ϕ 2 v = prox γϕ 2 (2 u − s ) convex nonsmooth case with Douglas-Rachford ◮ stationary points characterized by u − v = 0 ◮ Douglas-Rachford envelope discovered for convex problems 1 1 ( s ) � 2 + ϕ γ ( s ) := ϕ γ 1 ( s ) − γ �∇ ϕ γ 2 ( s − 2 γ ∇ ϕ γ ϕ DR 1 ( s )) γ real-valued function with gradient proportional to the DR-residual (for ϕ 1 ∈ C 2 , γ < 1 /L ϕ 1 ) M γ ( s ) = I − 2 γ ∇ 2 ϕ γ ϕ DR ( s ) = M γ ( s )( u − v ) 1 ( s ) ≻ 0 γ ◮ used to devise accelerated DRS (ADMM via dual 2 ) 1Patrinos P., Stella L. and Bemporad A. Douglas-Rachford splitting: complexity estimates and accelerated variants . CDC 2014 2Pejcic I. and Jones C. Accelerated ADMM based on accelerated Douglas-Rachford splitting . ECC 2016 5 / 28

  9. Douglas-Rachford Envelope “Integrating” the fixed-point residual 1 ( s ) � 2 + ϕ γ ( s ) := ϕ γ 1 ( s ) − γ �∇ ϕ γ 2 ( s − 2 γ ∇ ϕ γ ϕ DR 1 ( s )) γ If ◮ ϕ 1 : dom ϕ 1 → I R has L ϕ 1 -Lipschitz gradient ◮ dom ϕ 1 is affine and contains dom ϕ 2 ◮ no convexity assumptions! then for γ < 1 / L ϕ 1 , ◮ inf ϕ = inf ϕ DR γ ◮ s ∈ argmin ϕ DR ⇐ ⇒ prox γϕ 1 ( s ) ∈ argmin ϕ γ Minimizing ϕ is equivalent to minimizing ϕ DR γ 6 / 28

  10. Douglas-Rachford Envelope “Integrating” the fixed-point residual 1 ( s ) � 2 + ϕ γ ( s ) := ϕ γ 1 ( s ) − γ �∇ ϕ γ 2 ( s − 2 γ ∇ ϕ γ ϕ DR 1 ( s )) γ If ◮ ϕ 1 : dom ϕ 1 → I R has L ϕ 1 -Lipschitz gradient ◮ dom ϕ 1 is affine and contains dom ϕ 2 ◮ no convexity assumptions! then for γ < 1 / L ϕ 1 , ◮ inf ϕ = inf ϕ DR γ ◮ s ∈ argmin ϕ DR ⇐ ⇒ prox γϕ 1 ( s ) ∈ argmin ϕ γ Minimizing ϕ is equivalent to minimizing ϕ DR γ ∇ ϕ 1 ( x ) is the unique in dom ϕ � Notation: for x ∈ dom ϕ 1 , ˜ 1 s.t. ϕ 1 ( y ) = ϕ 1 ( x ) + � ˜ ∇ ϕ 1 ( x ) , y − x � + o ( � y − x � 2 ) y ∈ dom ϕ 1 6 / 28

  11. Douglas-Rachford Envelope DRE as an Augmented Lagrangian ◮ alternative expression � 2 γ � w − u � 2 � ϕ 1 ( u ) + ϕ 2 ( w ) + � ˜ ϕ DR ∇ ϕ 1 ( u ) , w − u � + 1 ( s ) = inf γ R n w ∈ I where u = prox γϕ 1 ( s ) . ◮ minimum attained at v ∈ prox γg (2 u − s ) : ϕ DR ( s ) = ϕ 1 ( u ) + ϕ 2 ( v ) + � ˜ 2 γ � v − u � 2 ∇ ϕ 1 ( u ) , v − u � + 1 γ ◮ apparently, ϕ DR for y = − ˜ ( s ) = L γ ( u, v, y ) ∇ ϕ 1 ( u ) γ where L γ is the augmented Lagrangian relative to minimize ϕ 1 ( x ) + ϕ 2 ( z ) x = z subject to 7 / 28

  12. Douglas-Rachford Envelope A new tool for analyzing convergence Key property: sufficient decrease after one DRS iteration  u = prox γϕ 1 ( s )  ( s ) − c � u − v � 2 ∃ c = c ( γ, λ ) > 0 ϕ DR ( s + ) ≤ ϕ DR v ∈ prox γϕ 2 (2 u − s ) γ γ s + = s + λ ( v − u )  ϕ ϕ DR γ ϕ DR ( s ) γ s 8 / 28

  13. Douglas-Rachford Envelope A new tool for analyzing convergence Key property: sufficient decrease after one DRS iteration  u = prox γϕ 1 ( s )  ( s ) − c � u − v � 2 ∃ c = c ( γ, λ ) > 0 ϕ DR ( s + ) ≤ ϕ DR v ∈ prox γϕ 2 (2 u − s ) γ γ s + = s + λ ( v − u )  ϕ ϕ DR γ ϕ DR ( s ) γ ϕ DR ( s + ) γ s + s 8 / 28

  14. Douglas-Rachford Envelope A new tool for analyzing convergence Key property: sufficient decrease after one DRS iteration  u = prox γϕ 1 ( s )  ( s ) − c � u − v � 2 ∃ c = c ( γ, λ ) > 0 ϕ DR ( s + ) ≤ ϕ DR v ∈ prox γϕ 2 (2 u − s ) γ γ s + = s + λ ( v − u )  ◮ nonconvex DRS studied only recently, using the DRE ◮ only λ = 1 (plain DRS) and λ = 2 (PRS) analyzed ◮ bounds on γ based on enforcing c ( γ, λ ) > 0 In this work, ◮ study extended to λ � = 1 , 2 ◮ much less conservative upper bound on γ 8 / 28

  15. Douglas-Rachford Envelope A new tool for analyzing convergence Nicer results if we can improve the quadratic lower bound 2 � x − y � 2 ≤ h ( y ) − h ( x ) − � ˜ σ h ∇ h ( x ) , y − x � ≤ L h 2 � x − y � 2 for some σ h ∈ [ − L h , L h ] . h ( x ) = 4 x 2 + sin (5 x ) has L h = 33 σ h = − 17 key inequality : if σ h ≤ 0 , for any L ≥ L h with L + σ h > 0 2( L + σ h ) � y − x � 2 + h ( y ) ≥ h ( x )+ � ˜ σ h L 2( L + σ h ) � ˜ 1 ∇ h ( y ) − ˜ ∇ h ( x ) � 2 ∇ h ( x ) , y − x � + 9 / 28

  16. Douglas-Rachford Envelope A new tool for analyzing convergence Nicer results if we can improve the quadratic lower bound 2 � x − y � 2 ≤ h ( y ) − h ( x ) − � ˜ σ h ∇ h ( x ) , y − x � ≤ L h 2 � x − y � 2 for some σ h ∈ [ − L h , L h ] . h ( x ) = 4 x 2 + sin (5 x ) has L h = 33 σ h = − 17 key inequality : if σ h ≤ 0 , for any L ≥ L h with L + σ h > 0 2( L + σ h ) � y − x � 2 + h ( y ) ≥ h ( x )+ � ˜ σ h L 2( L + σ h ) � ˜ 1 ∇ h ( y ) − ˜ ∇ h ( x ) � 2 ∇ h ( x ) , y − x � + 9 / 28

Recommend


More recommend