Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Shiqian Ma Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop on Optimization for Modern Computation BICMR, Beijing, China September 2, 2014 Shiqian Ma Multi-Block ADMM
Outline ADMM for N = 2 Existing work on ADMM for N ≥ 3 Convergence Rates of ADMM for N ≥ 3 BSUM-M Shiqian Ma Multi-Block ADMM
Alternating Direction Method of Multipliers (ADMM) Convex optimization min f 1 ( x 1 ) + f 2 ( x 2 ) + . . . + f N ( x N ) s.t. A 1 x 1 + A 2 x 2 + . . . + A N x N = b x j ∈ X j , j = 1 , 2 , . . . , N . f j : closed convex function X j : closed convex set Augmented Lagrangian function N N N A j x j − b � + γ � � � A j x j − b � 2 L γ ( x 1 , . . . , x N ; λ ) := f j ( x j ) −� λ, 2 � 2 j =1 j =1 j =1 Shiqian Ma Multi-Block ADMM
Multi-Block ADMM Augmented Lagrangian function N N N A j x j − b � + γ � � � A j x j − b � 2 L γ ( x 1 , . . . , x N ; λ ) := f j ( x j ) −� λ, 2 � 2 j =1 j =1 j =1 Multi-Block ADMM x k +1 argmin x 1 ∈X 1 L γ ( x 1 , x k 2 , . . . , x k N ; λ k ) := 1 x k +1 argmin x 2 ∈X 2 L γ ( x k +1 , x 2 , x k 3 , . . . , x k N ; λ k ) := 2 1 . . . x k +1 argmin x N ∈X N L γ ( x k +1 , x k +1 , . . . , x k +1 N − 1 , x N ; λ k ) := N 1 2 λ k − γ �� N � λ k +1 j =1 A j x k +1 := − b . j Update the primal variables in a Gauss-Seidel manner. Shiqian Ma Multi-Block ADMM
ADMM for N = 2 ADMM for N = 2 x k +1 argmin x 1 ∈X 1 L γ ( x 1 , x k 2 ; λ k ) := 1 x k +1 argmin x 2 ∈X 2 L γ ( x k +1 , x 2 ; λ k ) := 2 1 λ k − γ � � A 1 x k +1 + A 2 x k +1 λ k +1 := − b . 1 2 Long history goes back to variational methods for PDEs in 1950s; Relate to Douglas-Rachford and Peaceman-Rachford Operator Splitting Methods for finding zero of monotone operators. Find x , s.t. , 0 ∈ A ( x ) + B ( x ) . Revisited recently for sparse optimization [Wang-Yang-Yin-Zhang-2008] [Goldstein-Osher-2009] [Boyd-etal-2011] Shiqian Ma Multi-Block ADMM
Global Convergence of ADMM for N = 2 ADMM for N = 2 x k +1 argmin x 1 ∈X 1 L γ ( x 1 , x k 2 ; λ k ) := 1 x k +1 argmin x 2 ∈X 2 L γ ( x k +1 , x 2 ; λ k ) := 2 1 � � λ k − γ A 1 x k +1 + A 2 x k +1 λ k +1 := − b . 1 2 Global convergence for any γ > 0. (Fortin-Glowinski-1983; Gabay-1983; Glowinski-Le Tallec-1989; Eckstein-Bertsekas-1992) ADMM for N = 2 with fixed dual step size x k +1 argmin x 1 ∈X 1 L γ ( x 1 , x k 2 ; λ k ) := 1 x k +1 argmin x 2 ∈X 2 L γ ( x k +1 , x 2 ; λ k ) := 2 1 � � λ k − αγ A 1 x k +1 + A 2 x k +1 λ k +1 := − b . 1 2 α > 0 is a fixed dual step size √ Global convergence for any γ > 0 and α ∈ (0 , 1+ 5 ). 2 Shiqian Ma Multi-Block ADMM
Sublinear Convergence of ADMM for N = 2 Ergodic O (1 / k ) convergence (He-Yuan-2012) Non-Ergodic O (1 / k ) convergence (He-Yuan-2012) Ergodic O (1 / k ) convergence (Monteiro-Svaiter-2013) Shiqian Ma Multi-Block ADMM
Linear Convergence Rate of ADMM for N = 2 Douglas-Rachford splitting method converges linearly if B is coercive and Lipschitz (Lions-Mercier-1979) Linear convergence for solving linear programs (Eckstein-Bertsekas-1990) Linear convergence for quadratic programs (Han-Yuan-2013; Boley-2013) Shiqian Ma Multi-Block ADMM
Generalized ADMM Generalized ADMM for N = 2 (Deng-Yin-2012) x k +1 argmin x 1 ∈X 1 L γ ( x 1 , x k 2 ; λ k ) + 1 2 � x 1 − x k 1 � 2 := 1 P x k +1 argmin x 2 ∈X 2 L γ ( x k +1 , x 2 ; λ k ) + 1 2 � 2 2 � x 2 − x k := 2 1 Q � � λ k − αγ A 1 x k +1 + A 2 x k +1 λ k +1 := − b . 1 2 One sufficient condition for guaranteeing global linear convergence: P = Q = 0, α = 1, f 2 strongly convex, ∇ f 2 Lipschtiz continuous, A 2 full row rank. Shiqian Ma Multi-Block ADMM
ADMM for N ≥ 3: a counter example A negative result (Chen-He-Ye-Yuan-2013): Direct extension of multi-block ADMM is not necessarily convergent A counter example: 1 1 1 A 1 x 1 + A 2 x 2 + A 3 x 3 = 0 , where A = ( A 1 , A 2 , A 3 ) = 1 1 2 1 2 2 The update of multi-block ADMM with γ = 1 is 3 0 0 0 0 0 0 − 4 − 5 1 1 1 x k +1 x k 4 6 0 0 0 0 0 0 − 7 1 1 2 1 1 x k +1 x k 5 7 9 0 0 0 0 0 0 1 2 2 2 2 = x k +1 x k 1 1 1 1 0 0 0 0 0 1 0 0 3 3 λ k λ k +1 1 1 2 0 1 0 0 0 0 0 1 0 1 2 2 0 0 1 0 0 0 0 0 1 Shiqian Ma Multi-Block ADMM
ADMM for N ≥ 3: a counter example Equivalently, x k +1 x k 2 2 = M , x k +1 x k where 3 3 λ k +1 λ k 144 − 9 − 9 − 9 18 8 157 − 5 13 − 8 1 M = 64 122 122 − 58 − 64 . 162 56 − 35 − 35 91 56 − 88 − 26 − 26 − 62 88 Note that ρ ( M ) > 1. Theorem (Chen-He-Ye-Yuan-2013) There existing an example where the direct extension of ADMM of three blocks with a real number initial point is not necessarily convergent for any choice of γ > 0. Shiqian Ma Multi-Block ADMM
ADMM for N ≥ 3: Strong convexity? 0 . 05 x 2 1 + 0 . 05 x 2 2 + 0 . 05 x 2 min 3 1 1 1 x 1 = 0 . s.t. 1 1 2 x 2 1 2 2 x 3 For γ = 1, ρ ( M ) = 1 . 0087 > 1 Able to find a proper initial point such that ADMM diverges Even for strongly convex programming, the extended ADMM is not necessarily convergent for a certain γ > 0. Shiqian Ma Multi-Block ADMM
ADMM for N ≥ 3: Strong convexity works! Global convergence Theorem (Han-Yuan-2012) If f i , i = 1 , . . . , N are strongly convex with parameter σ i ’s, and � � 2 σ i 0 < γ < min , 3( N − 1) λ max ( A ⊤ i A i ) i =1 ,..., N then multi-block ADMM globally converges. Convergence Rate? Shiqian Ma Multi-Block ADMM
ADMM for N ≥ 3: weaker condition and convergence rate t t 1 1 λ t = � x k +1 , 1 ≤ i ≤ N , ¯ � λ k +1 . x t u := ( x 1 , . . . , x N ) , ¯ i = i t + 1 t + 1 k =0 k =0 Theorem (Lin-Ma-Zhang-2014a) If f 2 , . . . , f N are strongly convex, f 1 is convex, and � 2 σ i 2 σ N � γ ≤ min i A i ) , , (2 N − i )( i − 1) λ max ( A ⊤ ( N − 2)( N + 1) λ max ( A ⊤ N A N ) 2 ≤ i ≤ N − 1 � � N u t ) − f ( u ∗ ) | = O (1 / t ), and � x t � � then | f (¯ A i ¯ i − b � = O (1 / t ). � � � i =1 Weaker condition Ergodic O (1 / t ) convergence rate in terms of objective value and primal feasibility Shiqian Ma Multi-Block ADMM
ADMM for N ≥ 3: non-ergodic convergence rate Optimality measure: if A 2 x k +1 − A 2 x k 2 = 0 , 2 A 3 x k +1 − A 3 x k 3 = 0 , 3 A 1 x k +1 + A 2 x k +1 + A 3 x k +1 − b = 0 , 1 2 3 then ( x k +1 , x k +1 , x k +1 , λ k +1 ) is optimal. 1 2 3 Define � A 1 x k +1 + A 2 x k +1 + A 3 x k +1 − b � 2 R k +1 := 1 2 3 2 � 2 + 3 � A 3 x k +1 +2 � A 2 x k +1 − A 2 x k − A 3 x k 3 � 2 . 2 3 We can prove: R k = o (1 / k ) Shiqian Ma Multi-Block ADMM
ADMM for N ≥ 3: non-ergodic convergence rate Theorem (Lin-Ma-Zhang-2014a) If f 2 and f 3 are strongly convex, and � σ 2 σ 3 � γ ≤ min 2 A 2 ) , , 2 λ max ( A ⊤ 2 λ max ( A ⊤ 3 A 3 ) then ∞ � R k < + ∞ and R k = o (1 / k ) . k =1 Shiqian Ma Multi-Block ADMM
ADMM for N ≥ 3: non-ergodic convergence rate Theorem (Lin-Ma-Zhang-2014a) If f 2 , . . . , f N are strongly convex, and � � 2 σ i 2 σ N γ ≤ min i A i ) , , (2 N − i )( i − 1) λ max ( A ⊤ ( N − 2)( N + 1) λ max ( A ⊤ N A N ) 2 ≤ i ≤ N − 1 then ∞ � R k < + ∞ and R k = o (1 / k ) , k =1 where 2 � N � N (2 N − i )( i − 1) � � � A i x k +1 � i − A i x k +1 � 2 . � A i x k R k +1 := − b + � � i i 2 � � � i =1 � i =2 Shiqian Ma Multi-Block ADMM
ADMM for N ≥ 3: global linear convergence Globally linear convergence of ADMM for N ≥ 3 (Lin-Ma-Zhang-2014b) s.c. Lipschitz full row rank full column rank 1 f 2 , · · · , f N ∇ f N A N — 2 f 1 , · · · , f N ∇ f 1 , · · · , ∇ f N — — 3 f 2 , · · · , f N ∇ f 1 , · · · , ∇ f N — A 1 Table: Three scenarios leading to global linear convergence Reduce to the conditions in (Deng-Yin-2012) when N = 2 Shiqian Ma Multi-Block ADMM
Recommend
More recommend