recent developments of alternating direction method of
play

Recent Developments of Alternating Direction Method of Multipliers - PowerPoint PPT Presentation

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Shiqian Ma Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop on Optimization for Modern


  1. Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Shiqian Ma Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop on Optimization for Modern Computation BICMR, Beijing, China September 2, 2014 Shiqian Ma Multi-Block ADMM

  2. Outline ADMM for N = 2 Existing work on ADMM for N ≥ 3 Convergence Rates of ADMM for N ≥ 3 BSUM-M Shiqian Ma Multi-Block ADMM

  3. Alternating Direction Method of Multipliers (ADMM) Convex optimization min f 1 ( x 1 ) + f 2 ( x 2 ) + . . . + f N ( x N ) s.t. A 1 x 1 + A 2 x 2 + . . . + A N x N = b x j ∈ X j , j = 1 , 2 , . . . , N . f j : closed convex function X j : closed convex set Augmented Lagrangian function N N N A j x j − b � + γ � � � A j x j − b � 2 L γ ( x 1 , . . . , x N ; λ ) := f j ( x j ) −� λ, 2 � 2 j =1 j =1 j =1 Shiqian Ma Multi-Block ADMM

  4. Multi-Block ADMM Augmented Lagrangian function N N N A j x j − b � + γ � � � A j x j − b � 2 L γ ( x 1 , . . . , x N ; λ ) := f j ( x j ) −� λ, 2 � 2 j =1 j =1 j =1 Multi-Block ADMM x k +1 argmin x 1 ∈X 1 L γ ( x 1 , x k 2 , . . . , x k N ; λ k )  := 1   x k +1 argmin x 2 ∈X 2 L γ ( x k +1 , x 2 , x k 3 , . . . , x k N ; λ k ) :=   2 1   .  . . x k +1 argmin x N ∈X N L γ ( x k +1 , x k +1 , . . . , x k +1 N − 1 , x N ; λ k ) :=   N 1 2   λ k − γ �� N �  λ k +1 j =1 A j x k +1  := − b .  j Update the primal variables in a Gauss-Seidel manner. Shiqian Ma Multi-Block ADMM

  5. ADMM for N = 2 ADMM for N = 2 x k +1  argmin x 1 ∈X 1 L γ ( x 1 , x k 2 ; λ k ) := 1   x k +1 argmin x 2 ∈X 2 L γ ( x k +1 , x 2 ; λ k ) := 2 1 λ k − γ � � A 1 x k +1 + A 2 x k +1 λ k +1 := − b .   1 2 Long history goes back to variational methods for PDEs in 1950s; Relate to Douglas-Rachford and Peaceman-Rachford Operator Splitting Methods for finding zero of monotone operators. Find x , s.t. , 0 ∈ A ( x ) + B ( x ) . Revisited recently for sparse optimization [Wang-Yang-Yin-Zhang-2008] [Goldstein-Osher-2009] [Boyd-etal-2011] Shiqian Ma Multi-Block ADMM

  6. Global Convergence of ADMM for N = 2 ADMM for N = 2 x k +1  argmin x 1 ∈X 1 L γ ( x 1 , x k 2 ; λ k ) := 1   x k +1 argmin x 2 ∈X 2 L γ ( x k +1 , x 2 ; λ k ) := 2 1 � � λ k − γ A 1 x k +1 + A 2 x k +1 λ k +1  := − b .  1 2 Global convergence for any γ > 0. (Fortin-Glowinski-1983; Gabay-1983; Glowinski-Le Tallec-1989; Eckstein-Bertsekas-1992) ADMM for N = 2 with fixed dual step size x k +1 argmin x 1 ∈X 1 L γ ( x 1 , x k 2 ; λ k )  := 1   x k +1 argmin x 2 ∈X 2 L γ ( x k +1 , x 2 ; λ k ) := 2 1 � � λ k − αγ A 1 x k +1 + A 2 x k +1 λ k +1  := − b .  1 2 α > 0 is a fixed dual step size √ Global convergence for any γ > 0 and α ∈ (0 , 1+ 5 ). 2 Shiqian Ma Multi-Block ADMM

  7. Sublinear Convergence of ADMM for N = 2 Ergodic O (1 / k ) convergence (He-Yuan-2012) Non-Ergodic O (1 / k ) convergence (He-Yuan-2012) Ergodic O (1 / k ) convergence (Monteiro-Svaiter-2013) Shiqian Ma Multi-Block ADMM

  8. Linear Convergence Rate of ADMM for N = 2 Douglas-Rachford splitting method converges linearly if B is coercive and Lipschitz (Lions-Mercier-1979) Linear convergence for solving linear programs (Eckstein-Bertsekas-1990) Linear convergence for quadratic programs (Han-Yuan-2013; Boley-2013) Shiqian Ma Multi-Block ADMM

  9. Generalized ADMM Generalized ADMM for N = 2 (Deng-Yin-2012) x k +1 argmin x 1 ∈X 1 L γ ( x 1 , x k 2 ; λ k ) + 1 2 � x 1 − x k 1 � 2  := 1 P   x k +1 argmin x 2 ∈X 2 L γ ( x k +1 , x 2 ; λ k ) + 1 2 � 2 2 � x 2 − x k := 2 1 Q � � λ k − αγ A 1 x k +1 + A 2 x k +1 λ k +1  := − b .  1 2 One sufficient condition for guaranteeing global linear convergence: P = Q = 0, α = 1, f 2 strongly convex, ∇ f 2 Lipschtiz continuous, A 2 full row rank. Shiqian Ma Multi-Block ADMM

  10. ADMM for N ≥ 3: a counter example A negative result (Chen-He-Ye-Yuan-2013): Direct extension of multi-block ADMM is not necessarily convergent A counter example:   1 1 1 A 1 x 1 + A 2 x 2 + A 3 x 3 = 0 , where A = ( A 1 , A 2 , A 3 ) = 1 1 2   1 2 2 The update of multi-block ADMM with γ = 1 is  3 0 0 0 0 0   0 − 4 − 5 1 1 1  x k +1 x k 4 6 0 0 0 0   0 0 − 7 1 1 2   1 1       x k +1   x k 5 7 9 0 0 0 0 0 0 1 2 2        2  2  =         x k +1 x k 1 1 1 1 0 0 0 0 0 1 0 0       3  3     λ k λ k +1 1 1 2 0 1 0 0 0 0 0 1 0     1 2 2 0 0 1 0 0 0 0 0 1 Shiqian Ma Multi-Block ADMM

  11. ADMM for N ≥ 3: a counter example Equivalently, x k +1 x k     2 2  = M  , x k +1 x k where   3 3 λ k +1 λ k  144 − 9 − 9 − 9 18  8 157 − 5 13 − 8   1   M = 64 122 122 − 58 − 64 .   162   56 − 35 − 35 91 56   − 88 − 26 − 26 − 62 88 Note that ρ ( M ) > 1. Theorem (Chen-He-Ye-Yuan-2013) There existing an example where the direct extension of ADMM of three blocks with a real number initial point is not necessarily convergent for any choice of γ > 0. Shiqian Ma Multi-Block ADMM

  12. ADMM for N ≥ 3: Strong convexity? 0 . 05 x 2 1 + 0 . 05 x 2 2 + 0 . 05 x 2 min 3     1 1 1 x 1  = 0 . s.t. 1 1 2 x 2    1 2 2 x 3 For γ = 1, ρ ( M ) = 1 . 0087 > 1 Able to find a proper initial point such that ADMM diverges Even for strongly convex programming, the extended ADMM is not necessarily convergent for a certain γ > 0. Shiqian Ma Multi-Block ADMM

  13. ADMM for N ≥ 3: Strong convexity works! Global convergence Theorem (Han-Yuan-2012) If f i , i = 1 , . . . , N are strongly convex with parameter σ i ’s, and � � 2 σ i 0 < γ < min , 3( N − 1) λ max ( A ⊤ i A i ) i =1 ,..., N then multi-block ADMM globally converges. Convergence Rate? Shiqian Ma Multi-Block ADMM

  14. ADMM for N ≥ 3: weaker condition and convergence rate t t 1 1 λ t = � x k +1 , 1 ≤ i ≤ N , ¯ � λ k +1 . x t u := ( x 1 , . . . , x N ) , ¯ i = i t + 1 t + 1 k =0 k =0 Theorem (Lin-Ma-Zhang-2014a) If f 2 , . . . , f N are strongly convex, f 1 is convex, and � 2 σ i 2 σ N � γ ≤ min i A i ) , , (2 N − i )( i − 1) λ max ( A ⊤ ( N − 2)( N + 1) λ max ( A ⊤ N A N ) 2 ≤ i ≤ N − 1 � � N u t ) − f ( u ∗ ) | = O (1 / t ), and � x t � � then | f (¯ A i ¯ i − b � = O (1 / t ). � � � i =1 Weaker condition Ergodic O (1 / t ) convergence rate in terms of objective value and primal feasibility Shiqian Ma Multi-Block ADMM

  15. ADMM for N ≥ 3: non-ergodic convergence rate Optimality measure: if A 2 x k +1  − A 2 x k 2 = 0 , 2  A 3 x k +1 − A 3 x k 3 = 0 , 3 A 1 x k +1 + A 2 x k +1 + A 3 x k +1  − b = 0 , 1 2 3 then ( x k +1 , x k +1 , x k +1 , λ k +1 ) is optimal. 1 2 3 Define � A 1 x k +1 + A 2 x k +1 + A 3 x k +1 − b � 2 R k +1 := 1 2 3 2 � 2 + 3 � A 3 x k +1 +2 � A 2 x k +1 − A 2 x k − A 3 x k 3 � 2 . 2 3 We can prove: R k = o (1 / k ) Shiqian Ma Multi-Block ADMM

  16. ADMM for N ≥ 3: non-ergodic convergence rate Theorem (Lin-Ma-Zhang-2014a) If f 2 and f 3 are strongly convex, and � σ 2 σ 3 � γ ≤ min 2 A 2 ) , , 2 λ max ( A ⊤ 2 λ max ( A ⊤ 3 A 3 ) then ∞ � R k < + ∞ and R k = o (1 / k ) . k =1 Shiqian Ma Multi-Block ADMM

  17. ADMM for N ≥ 3: non-ergodic convergence rate Theorem (Lin-Ma-Zhang-2014a) If f 2 , . . . , f N are strongly convex, and � � 2 σ i 2 σ N γ ≤ min i A i ) , , (2 N − i )( i − 1) λ max ( A ⊤ ( N − 2)( N + 1) λ max ( A ⊤ N A N ) 2 ≤ i ≤ N − 1 then ∞ � R k < + ∞ and R k = o (1 / k ) , k =1 where 2 � N � N (2 N − i )( i − 1) � � � A i x k +1 � i − A i x k +1 � 2 . � A i x k R k +1 := − b + � � i i 2 � � � i =1 � i =2 Shiqian Ma Multi-Block ADMM

  18. ADMM for N ≥ 3: global linear convergence Globally linear convergence of ADMM for N ≥ 3 (Lin-Ma-Zhang-2014b) s.c. Lipschitz full row rank full column rank 1 f 2 , · · · , f N ∇ f N A N — 2 f 1 , · · · , f N ∇ f 1 , · · · , ∇ f N — — 3 f 2 , · · · , f N ∇ f 1 , · · · , ∇ f N — A 1 Table: Three scenarios leading to global linear convergence Reduce to the conditions in (Deng-Yin-2012) when N = 2 Shiqian Ma Multi-Block ADMM

Recommend


More recommend