Accelerated Douglas-Rachford splitting and ADMM for structured - PowerPoint PPT Presentation

Accelerated Douglas-Rachford splitting and ADMM for structured nonconvex optimization Panos Patrinos KU Leuven (ESAT-STADIUS) joint work with Andreas Themelis and Lorenzo Stella LCCC Workshop Large-Scale and Distributed Optimization Lund, Sweden June 14, 2017 A. Themelis, L. Stella and P. Patrinos Douglas–Rachford splitting and ADMM for nonconvex optimization: new convergence results and accelerated versions https://arxiv.org/abs/1709.05747

Structured nonconvex optimization composite problem separable problem minimize ϕ 1 ( s ) + ϕ 2 ( s ) minimize f ( x ) + g ( z ) subject to Ax + Bz = b ◮ templates for large-scale structured optimization ◮ ϕ 1 , ϕ 2 , f , g can be nonsmooth ◮ numerous applications ◮ machine learning ◮ statistics ◮ signal/image processing, ◮ control. . . ◮ traditional algorithms usually do not apply 1 / 28

Structured nonconvex optimization composite problem separable problem minimize ϕ 1 ( s ) + ϕ 2 ( s ) minimize f ( x ) + g ( z ) subject to Ax + Bz = b ◮ resurgence of proximal algorithms (or operator splitting methods) ◮ reduce complex problem into a series of simpler subproblems ◮ perhaps most popular proximal algorithms Douglas-Rachford Splitting (DRS) Alternating Direction Method of Multipliers (ADMM) ◮ elegant, complete theory for convex problems (monotone operators, fixed-point iterations, Fejér sequences. . . 1 ) 1Bauschke H.H. and Combettes P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer 2011 1 / 28

Contribution composite problem separable problem minimize ϕ 1 ( s ) + ϕ 2 ( s ) minimize f ( x ) + g ( z ) subject to Ax + Bz = b DRS & ADMM ◮ being fixed point iterations, DRS & ADMM can be agonizingly slow ◮ nonconvex problems: incomplete theory, results empirical or local 1,2 ◮ global results have recently emerged (see next slides) this talk ◮ global convergence theory for nonconvex problems based on the Douglas-Rachford Envelope (DRE) ◮ more importantly, new, robust, faster algorithms 1R. Hesse and R. Luke Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems . SIAM Opt. 23(4) 2013 2F. Artacho, J. Borwein and M. Tam Recent Results on Douglas–Rachford Methods for Combinatorial Optimization Problems . JOTA 163(1) 2014 1 / 28

Many applications... ◮ ADMM: amenable for distributed formulations (via consensus ) ◮ Nonconvex problems: no need for convex relaxation rank constraints, 0 /Schatten-norms, (mixed-) integer programming Some examples: ◮ hybrid system MPC 1 ◮ distributed sparse principal component analysis (SPCA) 2 ◮ dictionary learning 3 ◮ background-foreground extraction 4,5 ◮ sparse representations (signal processing) 6 1Takapoui R., Moehle N., Boyd S. and Bemporad A. A simple effective heuristic for embedded mixed-integer quadratic programming . IEEE ACC 2016 2Hajinezhad D. and Hong M. Nonconvex ADMM for distributed sparse principal component analysis . GlobalSIP 2015 3Wai H. T., Chang T. H. and Scaglione A. A consensus-based decentralized algorithm for non-convex optimization with appli- cation to dictionary learning . ICASSP 2015 4Chartrand R. Nonconvex splitting for regularized low-rank + sparse decomposition . IEEE TSP 2012 5Yang L., Pong T. K. and Chen X. ADMM for a class of nonconvex and nonsmooth problems with applications to background/- foreground extraction . SIAM 2017 6Chartrand R. and Wohlberg B. A nonconvex ADMM algorithm for group sparsity with sparse groups . ICASSP 2013 2 / 28

DRS for nonconvex problems to solve ϕ 1 ( s ) + ϕ 2 ( s ) minimize R n , iterate starting from s ∈ I u = prox γϕ 1 ( s ) v ∈ prox γϕ 2 (2 u − s ) s + = s + λ ( v − u ) standing assumptions 1. ϕ 1 and ϕ 2 are prox-friendly , however both can be nonconvex 2. dom ϕ 1 is affine and ∇ ϕ 1 is Lipschitz on dom ϕ 1 2 γ � · � 2 is bounded below for some γ > 0 ( prox-bounded ) 3. ϕ 2 + 1 4. dom ϕ 2 ⊆ dom ϕ 1 3 / 28

Structured Optimization Tools: proximal map Only proximal operations on ϕ 1 and ϕ 2 : � 2 γ � w − s � 2 � h ( w ) + 1 prox γh ( s ) = argmin , γ > 0 w ◮ a generalized projection : for h = δ C , prox γh = Π C Properties ◮ well defined for small γ ◮ Lipschitz for ϕ 1 (for small γ ), but set-valued for ϕ 2 ◮ “prox-friendly” (easily proximable) in many useful applications ◮ the value function is the Moreau envelope � 2 γ � w − s � 2 � h γ ( s ) := min h ( w ) + 1 w ◮ h γ is locally Lipschitz in general, even smooth for convex h 4 / 28

Douglas-Rachford Envelope “Integrating” the fixed-point residual � u = prox γϕ 1 ( s ) minimize ϕ = ϕ 1 + ϕ 2 v = prox γϕ 2 (2 u − s ) convex nonsmooth case with Douglas-Rachford ◮ stationary points characterized by u − v = 0 ◮ Douglas-Rachford envelope discovered for convex problems 1 1 ( s ) � 2 + ϕ γ ( s ) := ϕ γ 1 ( s ) − γ �∇ ϕ γ 2 ( s − 2 γ ∇ ϕ γ ϕ DR 1 ( s )) γ real-valued function with gradient proportional to the DR-residual (for ϕ 1 ∈ C 2 , γ < 1 /L ϕ 1 ) M γ ( s ) = I − 2 γ ∇ 2 ϕ γ ϕ DR ( s ) = M γ ( s )( u − v ) 1 ( s ) ≻ 0 γ ◮ used to devise accelerated DRS (ADMM via dual 2 ) 1Patrinos P., Stella L. and Bemporad A. Douglas-Rachford splitting: complexity estimates and accelerated variants . CDC 2014 2Pejcic I. and Jones C. Accelerated ADMM based on accelerated Douglas-Rachford splitting . ECC 2016 5 / 28

Douglas-Rachford Envelope “Integrating” the fixed-point residual 1 ( s ) � 2 + ϕ γ ( s ) := ϕ γ 1 ( s ) − γ �∇ ϕ γ 2 ( s − 2 γ ∇ ϕ γ ϕ DR 1 ( s )) γ If ◮ ϕ 1 : dom ϕ 1 → I R has L ϕ 1 -Lipschitz gradient ◮ dom ϕ 1 is affine and contains dom ϕ 2 ◮ no convexity assumptions! then for γ < 1 / L ϕ 1 , ◮ inf ϕ = inf ϕ DR γ ◮ s ∈ argmin ϕ DR ⇐ ⇒ prox γϕ 1 ( s ) ∈ argmin ϕ γ Minimizing ϕ is equivalent to minimizing ϕ DR γ 6 / 28

Douglas-Rachford Envelope “Integrating” the fixed-point residual 1 ( s ) � 2 + ϕ γ ( s ) := ϕ γ 1 ( s ) − γ �∇ ϕ γ 2 ( s − 2 γ ∇ ϕ γ ϕ DR 1 ( s )) γ If ◮ ϕ 1 : dom ϕ 1 → I R has L ϕ 1 -Lipschitz gradient ◮ dom ϕ 1 is affine and contains dom ϕ 2 ◮ no convexity assumptions! then for γ < 1 / L ϕ 1 , ◮ inf ϕ = inf ϕ DR γ ◮ s ∈ argmin ϕ DR ⇐ ⇒ prox γϕ 1 ( s ) ∈ argmin ϕ γ Minimizing ϕ is equivalent to minimizing ϕ DR γ ∇ ϕ 1 ( x ) is the unique in dom ϕ � Notation: for x ∈ dom ϕ 1 , ˜ 1 s.t. ϕ 1 ( y ) = ϕ 1 ( x ) + � ˜ ∇ ϕ 1 ( x ) , y − x � + o ( � y − x � 2 ) y ∈ dom ϕ 1 6 / 28

Douglas-Rachford Envelope DRE as an Augmented Lagrangian ◮ alternative expression � 2 γ � w − u � 2 � ϕ 1 ( u ) + ϕ 2 ( w ) + � ˜ ϕ DR ∇ ϕ 1 ( u ) , w − u � + 1 ( s ) = inf γ R n w ∈ I where u = prox γϕ 1 ( s ) . ◮ minimum attained at v ∈ prox γg (2 u − s ) : ϕ DR ( s ) = ϕ 1 ( u ) + ϕ 2 ( v ) + � ˜ 2 γ � v − u � 2 ∇ ϕ 1 ( u ) , v − u � + 1 γ ◮ apparently, ϕ DR for y = − ˜ ( s ) = L γ ( u, v, y ) ∇ ϕ 1 ( u ) γ where L γ is the augmented Lagrangian relative to minimize ϕ 1 ( x ) + ϕ 2 ( z ) x = z subject to 7 / 28

Douglas-Rachford Envelope A new tool for analyzing convergence Key property: sufficient decrease after one DRS iteration  u = prox γϕ 1 ( s )  ( s ) − c � u − v � 2 ∃ c = c ( γ, λ ) > 0 ϕ DR ( s + ) ≤ ϕ DR v ∈ prox γϕ 2 (2 u − s ) γ γ s + = s + λ ( v − u )  ϕ ϕ DR γ ϕ DR ( s ) γ s 8 / 28

Douglas-Rachford Envelope A new tool for analyzing convergence Key property: sufficient decrease after one DRS iteration  u = prox γϕ 1 ( s )  ( s ) − c � u − v � 2 ∃ c = c ( γ, λ ) > 0 ϕ DR ( s + ) ≤ ϕ DR v ∈ prox γϕ 2 (2 u − s ) γ γ s + = s + λ ( v − u )  ϕ ϕ DR γ ϕ DR ( s ) γ ϕ DR ( s + ) γ s + s 8 / 28

Douglas-Rachford Envelope A new tool for analyzing convergence Key property: sufficient decrease after one DRS iteration  u = prox γϕ 1 ( s )  ( s ) − c � u − v � 2 ∃ c = c ( γ, λ ) > 0 ϕ DR ( s + ) ≤ ϕ DR v ∈ prox γϕ 2 (2 u − s ) γ γ s + = s + λ ( v − u )  ◮ nonconvex DRS studied only recently, using the DRE ◮ only λ = 1 (plain DRS) and λ = 2 (PRS) analyzed ◮ bounds on γ based on enforcing c ( γ, λ ) > 0 In this work, ◮ study extended to λ � = 1 , 2 ◮ much less conservative upper bound on γ 8 / 28

Douglas-Rachford Envelope A new tool for analyzing convergence Nicer results if we can improve the quadratic lower bound 2 � x − y � 2 ≤ h ( y ) − h ( x ) − � ˜ σ h ∇ h ( x ) , y − x � ≤ L h 2 � x − y � 2 for some σ h ∈ [ − L h , L h ] . h ( x ) = 4 x 2 + sin (5 x ) has L h = 33 σ h = − 17 key inequality : if σ h ≤ 0 , for any L ≥ L h with L + σ h > 0 2( L + σ h ) � y − x � 2 + h ( y ) ≥ h ( x )+ � ˜ σ h L 2( L + σ h ) � ˜ 1 ∇ h ( y ) − ˜ ∇ h ( x ) � 2 ∇ h ( x ) , y − x � + 9 / 28

Accelerated Douglas-Rachford splitting and ADMM for structured - PowerPoint PPT Presentation

Accelerated Douglas-Rachford splitting and ADMM for structured nonconvex optimization Panos Patrinos KU Leuven (ESAT-STADIUS) joint work with Andreas Themelis and Lorenzo Stella LCCC Workshop Large-Scale and Distributed Optimization Lund,

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME

An Accelerated Variance Reducing Stochastic Method with Douglas-Rachford Splitting Jingchang Liu

Douglas-Rachford Splitting for Infeasible, Unbounded, and Pathological Problems Yanli Liu, Ernest

On the solution of Bingham fluids and a Preconditioned Douglas-Rachford splitting method for

EWG on Maritime Security Brunei Darussalam and New Zealand Co-Chairs 2014 2017 ADMM-PLUS EWG

Introduction 1 Splitting unpack 2 Splitting pack 3 Reduction 4 Advanced technicalities 5

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods

Plug-and-Play ADMM and Forward-Backward Splitting Ernest K. Ryu 1 Jialin Liu 1 Sicheng Wang 2

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

Constrained Tensor Factorization with Accelerated AO-ADMM Shaden Smith 1 , Alec Beri 2 , and

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

New Foreign Tax Credit and FTC Splitting Regulations and FTC Splitting Regulations Mastering

Splitting in SourceTerminal Network Reliability Estimation H ector Cancela Leslie Murray

Intro NLP Tools Sporleder & Rehbein WS 09/10 Sporleder & Rehbein (WS 09/10) PS Domain

Forwarding, Splitting, and Block Ordering to Optimize BDD-based Bisimulation Computation Ralf

Composition and Splitting Methods Book Sections II.4 and II.5 Claude Gittelson Seminar on

Social Security: With You Through Lifes Journey Pre-Retirement Seminar Federal Executive

THE MEANING OF MARRIAGE The Mission of Marriage (Part 2) THE MEANING OF MARRIAGE Gods

Hill Air Force Base Airman & Family Readiness Center Survivor Benefit Plan (SBP) What is

Discussion Researchers use the discussion to examine their work in the larger context of

Accelerated Douglas-Rachford splitting and ADMM for structured - PowerPoint PPT Presentation

Accelerated Douglas-Rachford splitting and ADMM for structured nonconvex optimization Panos Patrinos KU Leuven (ESAT-STADIUS) joint work with Andreas Themelis and Lorenzo Stella LCCC Workshop Large-Scale and Distributed Optimization Lund,

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE &amp; ICME

An Accelerated Variance Reducing Stochastic Method with Douglas-Rachford Splitting Jingchang Liu

Douglas-Rachford Splitting for Infeasible, Unbounded, and Pathological Problems Yanli Liu, Ernest

On the solution of Bingham fluids and a Preconditioned Douglas-Rachford splitting method for

EWG on Maritime Security Brunei Darussalam and New Zealand Co-Chairs 2014 2017 ADMM-PLUS EWG

Introduction 1 Splitting unpack 2 Splitting pack 3 Reduction 4 Advanced technicalities 5

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods

Plug-and-Play ADMM and Forward-Backward Splitting Ernest K. Ryu 1 Jialin Liu 1 Sicheng Wang 2

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

Constrained Tensor Factorization with Accelerated AO-ADMM Shaden Smith 1 , Alec Beri 2 , and

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

New Foreign Tax Credit and FTC Splitting Regulations and FTC Splitting Regulations Mastering

Splitting in SourceTerminal Network Reliability Estimation H ector Cancela Leslie Murray

Intro NLP Tools Sporleder &amp; Rehbein WS 09/10 Sporleder &amp; Rehbein (WS 09/10) PS Domain

Forwarding, Splitting, and Block Ordering to Optimize BDD-based Bisimulation Computation Ralf

Composition and Splitting Methods Book Sections II.4 and II.5 Claude Gittelson Seminar on

Social Security: With You Through Lifes Journey Pre-Retirement Seminar Federal Executive

THE MEANING OF MARRIAGE The Mission of Marriage (Part 2) THE MEANING OF MARRIAGE Gods

Hill Air Force Base Airman &amp; Family Readiness Center Survivor Benefit Plan (SBP) What is

Discussion Researchers use the discussion to examine their work in the larger context of

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME

Intro NLP Tools Sporleder & Rehbein WS 09/10 Sporleder & Rehbein (WS 09/10) PS Domain

Hill Air Force Base Airman & Family Readiness Center Survivor Benefit Plan (SBP) What is