anderson accelerated douglas rachford splitting
play

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang - PowerPoint PPT Presentation

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME Departments Stanford University March 10, 2020 1 Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments


  1. Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME Departments Stanford University March 10, 2020 1

  2. Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion 2

  3. Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Problem Overview 3

  4. Prox-Affine Problem Prox-affine convex optimization problem: � N minimize i =1 f i ( x i ) � N subject to i =1 A i x i = b with variables x i ∈ R n i for i = 1 , . . . , N ◮ A i ∈ R m × n i and b ∈ R m given data ◮ f i : R n i → R ∪ { + ∞} are closed, convex and proper ◮ Each f i can only be accessed via its proximal operator � � f i ( x i ) + 1 2 t � x i − v i � 2 prox tf i ( v i ) = argmin x i , 2 where t > 0 is a parameter Problem Overview 4

  5. Why This Formulation? ◮ Encompasses many classes of convex problems (conic programs, consensus optimization) ◮ Block separable form ideal for distributed optimization ◮ Proximal operator can be provided as a “black box”, enabling privacy-preserving implementation Problem Overview 5

  6. Previous Work ◮ Alternating direction method of multipliers (ADMM) ◮ Douglas-Rachford splitting (DRS) ◮ Augmented Lagrangian method (ALM) Problem Overview 6

  7. Previous Work ◮ Alternating direction method of multipliers (ADMM) ◮ Douglas-Rachford splitting (DRS) ◮ Augmented Lagrangian method (ALM) These are typically slow to converge, prompting research into acceleration techniques: ◮ Adaptive penalty parameters ◮ Momentum methods ◮ Quasi-Newton method with line search Problem Overview 6

  8. Our Method ◮ A2DR : Anderson acceleration (AA) applied to DRS ◮ DRS is a non-expansive fixed-point (NEFP) method that fits prox-affine framework ◮ AA is fast, efficient, and can be applied to NEFP iterations – but unstable without modification ◮ We introduce a type-II AA variant that converges globally in non-smooth, potentially pathological settings Problem Overview 7

  9. Main Advantages ◮ A2DR produces primal and dual solutions, or a certificate of infeasibility/unboundedness ◮ Consistently converges faster with no parameter tuning ◮ Memory efficient ⇒ little extra cost per iteration ◮ Scales to large problems and is easily parallelized ◮ Python implementation: https://github.com/cvxgrp/a2dr Problem Overview 8

  10. Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Douglas-Rachford Splitting 9

  11. DRS Algorithm ◮ Define A = [ A 1 . . . A n ] and x = ( x 1 , . . . , x N ) ◮ Rewrite problem using set indicator I S � N minimize i =1 f i ( x i ) + I Ax = b ( x ) ◮ DRS iterates for k = 1 , 2 , . . . , x k +1 / 2 = prox tf i ( v k ) , i = 1 , . . . , N i v k +1 / 2 = 2 x k +1 / 2 − v k x k +1 = Π Av = b ( v k +1 / 2 ) v k +1 = v k + x k +1 − x k +1 / 2 Π S ( v ) is Euclidean projection of v onto S Douglas-Rachford Splitting 10

  12. Convergence of DRS ◮ DRS iterations can be conceived as a fixed-point mapping v k +1 = F ( v k ) , where F is firmly non-expansive ◮ v k converges to a fixed point of F (if it exists) ◮ x k and x k +1 / 2 converge to a solution of our problem Douglas-Rachford Splitting 11

  13. Convergence of DRS ◮ DRS iterations can be conceived as a fixed-point mapping v k +1 = F ( v k ) , where F is firmly non-expansive ◮ v k converges to a fixed point of F (if it exists) ◮ x k and x k +1 / 2 converge to a solution of our problem In practice, this convergence is often slow... Douglas-Rachford Splitting 11

  14. Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Anderson Acceleration 12

  15. Type-II AA ◮ Quasi-Newton method for accelerating fixed point iterations ◮ Extrapolates next iterate using M + 1 most recent iterates M v k +1 = � α k j F ( v k − M + j ) j =0 ◮ Let G ( v ) = v − F ( v ), then α k ∈ R M +1 is solution to � � M j =0 α k j G ( v k − M + j ) � 2 minimize 2 � M j =0 α k subject to j = 1 ◮ Typically only need M ≈ 10 for good performance Anderson Acceleration 13

  16. Adaptive Regularization ◮ Type-II AA is unstable (Scieur, d’Aspremont, Bach 2016) and can provably diverge (Mai, Johansson 2019) ◮ Add adaptive regularization term to unconstrained formulation Anderson Acceleration 14

  17. Adaptive Regularization ◮ Type-II AA is unstable (Scieur, d’Aspremont, Bach 2016) and can provably diverge (Mai, Johansson 2019) ◮ Add adaptive regularization term to unconstrained formulation ◮ Change variables to γ k ∈ R M α k 0 = γ k α k i = γ k i − γ k α k M = 1 − γ k i − 1 ∀ i = 1 , . . . , M − 1 , 0 , M − 1 ◮ Unconstrained AA problem is � g k − Y k γ k � 2 minimize 2 , where we define g k = G ( v k ) , y k = g k +1 − g k , Y k = [ y k − M . . . y k − 1 ] Anderson Acceleration 14

  18. Adaptive Regularization ◮ Type-II AA is unstable (Scieur, d’Aspremont, Bach 2016) and can provably diverge (Mai, Johansson 2019) ◮ Add adaptive regularization term to unconstrained formulation ◮ Change variables to γ k ∈ R M α k 0 = γ k α k i = γ k i − γ k α k M = 1 − γ k 0 , i − 1 ∀ i = 1 , . . . , M − 1 , M − 1 ◮ Stabilized AA problem is � � γ k � 2 � g k − Y k γ k � 2 � � S k � 2 F + � Y k � 2 minimize 2 + η 2 , F where η ≥ 0 is a parameter and y k = g k +1 − g k , g k = G ( v k ) , Y k = [ y k − M . . . y k − 1 ] s k = v k +1 − v k , S k = [ s k − M . . . s k − 1 ] Anderson Acceleration 15

  19. A2DR ◮ Parameters: M = max-memory, R = safeguarding parameter ◮ A2DR iterates for k = 1 , 2 , . . . , g k = v k − v k +1 1. v k +1 DRS = F ( v k ) , DRS 2. Compute α k by solving stabilized AA problem = � M j v k − M + j +1 3. v k +1 j =0 α k AA DRS 4. Safeguard check: If � G ( v k ) � 2 is small enough, v k + i = v k + i AA for i = 1 , . . . , R Otherwise, v k +1 = v k +1 DRS ◮ Safeguard ensures convergence in infeasible/unbounded case Anderson Acceleration 16

  20. Stopping Criterion of A2DR ◮ Stop and output x k +1 / 2 when � r k � 2 ≤ ǫ tol prim = Ax k +1 / 2 − b r k t ( v k − x k +1 / 2 ) + A T λ k r k dual = 1 r k = ( r k prim , r k dual ) ◮ Dual variable is minimizer of dual residual norm t ( v k − x k +1 / 2 ) + A T λ λ k = argmin λ � � � 1 � � � 2 ◮ Note that this is a simple least-squares problem Anderson Acceleration 17

  21. Convergence of A2DR Theorem (Solvable Case) If the problem is feasible and bounded, k →∞ � r k � 2 = 0 lim inf and the AA candidates are adopted infinitely often. Furthermore, if F has a fixed point, k →∞ v k = v ⋆ and k →∞ x k +1 / 2 = x ⋆ , lim lim where v ⋆ is a fixed point of F and x ⋆ is a solution to the problem. Anderson Acceleration 18

  22. Convergence of A2DR Theorem (Pathological Case) If the problem is pathological (infeasible/unbounded), v k − v k +1 � � = δ v � = 0 . lim k →∞ Furthermore, if lim k →∞ Ax k +1 / 2 = b, the problem is unbounded and � δ v � 2 = t dist ( dom f ∗ , R ( A T )) . Otherwise, it is infeasible and � δ v � 2 ≥ dist ( dom f , { x : Ax = b } ) . Here f ( x ) = � N i =1 f i ( x i ) . Anderson Acceleration 19

  23. Preconditioning ◮ Convergence greatly improved by rescaling problem ◮ Replace original A , b , f i with ˆ ˆ ˆ A = DAE , b = Db , f i (ˆ x i ) = f i ( e i ˆ x i ) ◮ D and E are diagonal positive, e i > 0 corresponds to i th block diagonal entry of E ◮ D and E chosen by equilibrating A (see paper for details) ◮ Proximal operator of ˆ f i can be evaluated using proximal operator of f i v i ) = 1 prox t ˆ f i (ˆ e i prox ( e 2 i t ) f i ( e i ˆ v i ) Anderson Acceleration 20

  24. Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Numerical Experiments 21

  25. Python Solver Interface result = a2dr(prox_list, A_list, b) Input arguments: ◮ prox_list is list of proximal function handles, e.g. , f i ( x i ) = x i ⇒ prox_list[i] = lambda v,t: v - t ◮ A_list is list of matrices A i , b is vector b Output dictionary keys: ◮ num_iters is total number of iterations K ◮ x_vals is list of final values x K i ◮ primal and dual are vectors containing r k prim and r k dual for k = 1 , . . . , K Numerical Experiments 22

  26. Proximal Library We provide an extensive proximal library in a2dr.proximal f ( x ) prox tf ( v ) Function Handle x v − t prox_identity � x � 1 ( v − t ) + − ( − v − t ) + prox_norm1 � x � 2 (1 − t / � v � 2 ) + v prox_norm2 � x � ∞ Bisection prox_norm_inf e x v − W ( te v ) prox_exp √ � v 2 + 4 t � − log( x ) v + / 2 prox_neg_log i log(1 + e x i ) � Newton-CG prox_logistic � Fx − g � 2 LSQR prox_sum_squares_affine 2 I R n + ( x ) max( v , 0) prox_nonneg_constr ...And much more! See the documentation for full list Numerical Experiments 23

  27. Nonnegative Least Squares (NNLS) � Fz − g � 2 minimize 2 subject to z ≥ 0 with respect to z ∈ R q ◮ Problem data: F ∈ R p × q and g ∈ R p ◮ Can be written in standard form with f 1 ( x 1 ) = � Fx 1 − g � 2 2 , f 2 ( x 2 ) = I R n + ( x 2 ) A 2 = − I , A 1 = I , b = 0 ◮ We evaluate proximal operator of f 1 using LSQR Numerical Experiments 24

Recommend


More recommend