Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity A Canonical Convex Optimization Model A canonical convex minimization model with linear constraints: min { θ ( x ) | Ax = b , x ∈ X} , with A ∈ ℜ m × n , b ∈ ℜ m , X ⊆ ℜ n a closed convex set, θ : ℜ n → ℜ a convex but not necessarily smooth function. Solving the original model — thus with 100 % accuracy. But how? — in general, not possible. — not implementable. The penalty method: θ ( x ) + β � x ∈ X x k + 1 = arg min 2 � Ax − b � 2 � � � which solves an easier problem without linear constraints — with much more implementability. Of course, with much less accuracy —- indeed, not necessarily convergent if β � + ∞ . With sufficient implementability while too little accuracy. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 5 / 37
Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity The augmented Lagrangian method How can we keep both the implementability (as the penalty method) and accuracy (with convergence)? Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 6 / 37
Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity The augmented Lagrangian method How can we keep both the implementability (as the penalty method) and accuracy (with convergence)? Answer: The augmented Lagrangian method (H. Hestenes and M. Powell in 1969, individually) Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 6 / 37
Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity The augmented Lagrangian method How can we keep both the implementability (as the penalty method) and accuracy (with convergence)? Answer: The augmented Lagrangian method (H. Hestenes and M. Powell in 1969, individually) � x ∈ X x k + 1 θ ( x ) − ( λ k ) T ( Ax − b ) + β 2 � Ax − b � 2 � � � � = arg min λ k + 1 λ k − β ( Ax k + 1 − b ) = where λ ∈ ℜ m is the Lagrange multiplier and β > 0 is a penalty parameter. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 6 / 37
Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity The augmented Lagrangian method How can we keep both the implementability (as the penalty method) and accuracy (with convergence)? Answer: The augmented Lagrangian method (H. Hestenes and M. Powell in 1969, individually) � x ∈ X x k + 1 θ ( x ) − ( λ k ) T ( Ax − b ) + β 2 � Ax − b � 2 � � � � = arg min λ k + 1 λ k − β ( Ax k + 1 − b ) = where λ ∈ ℜ m is the Lagrange multiplier and β > 0 is a penalty parameter. The subproblem is as difficult as that of the penalty method (the same level of implementability) Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 6 / 37
Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity The augmented Lagrangian method How can we keep both the implementability (as the penalty method) and accuracy (with convergence)? Answer: The augmented Lagrangian method (H. Hestenes and M. Powell in 1969, individually) � x ∈ X x k + 1 θ ( x ) − ( λ k ) T ( Ax − b ) + β 2 � Ax − b � 2 � � � � = arg min λ k + 1 λ k − β ( Ax k + 1 − b ) = where λ ∈ ℜ m is the Lagrange multiplier and β > 0 is a penalty parameter. The subproblem is as difficult as that of the penalty method (the same level of implementability) It is convergent with any fixed β > 0 (higher accuracy) Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 6 / 37
Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity Some Comments on ALM The ALM: � x ∈ X x k + 1 θ ( x ) − ( λ k ) T ( Ax − b ) + β 2 � Ax − b � 2 � � � � = arg min λ k + 1 λ k − β ( Ax k + 1 − b ) = ALM has an augmented term and it updates the dual variable iteratively Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 7 / 37
Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity Some Comments on ALM The ALM: � x ∈ X x k + 1 θ ( x ) − ( λ k ) T ( Ax − b ) + β 2 � Ax − b � 2 � � � � = arg min λ k + 1 λ k − β ( Ax k + 1 − b ) = ALM has an augmented term and it updates the dual variable iteratively In 1976, T. Rockafellar showed that ALM is an application of the proximal point algorithm (B. Martinet, 1970, or even earlier, J. Moreau, 1965) to the dual problem of the model above. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 7 / 37
Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity Some Comments on ALM The ALM: � x ∈ X x k + 1 θ ( x ) − ( λ k ) T ( Ax − b ) + β 2 � Ax − b � 2 � � � � = arg min λ k + 1 λ k − β ( Ax k + 1 − b ) = ALM has an augmented term and it updates the dual variable iteratively In 1976, T. Rockafellar showed that ALM is an application of the proximal point algorithm (B. Martinet, 1970, or even earlier, J. Moreau, 1965) to the dual problem of the model above. It can be regarded as a dual ascent method over the dual variable λ . Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 7 / 37
Backgrounds 2014 Workshop on Optimization for Modern Computation, Peking Univesity Some Comments on ALM The ALM: � x ∈ X x k + 1 θ ( x ) − ( λ k ) T ( Ax − b ) + β 2 � Ax − b � 2 � � � � = arg min λ k + 1 λ k − β ( Ax k + 1 − b ) = ALM has an augmented term and it updates the dual variable iteratively In 1976, T. Rockafellar showed that ALM is an application of the proximal point algorithm (B. Martinet, 1970, or even earlier, J. Moreau, 1965) to the dual problem of the model above. It can be regarded as a dual ascent method over the dual variable λ . A significant difference from the penalty method — the penalty parameter of ALM can theoretically be fixed as any positive scalar. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 7 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Outline Backgrounds 1 Accuracy v.s. Implementability – An Easier Case 2 Accuracy v.s. Implementability – A More Complicated Case 3 Conclusions 4 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 8 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity A Separable Model For many applications, the last model can be specified as a separable form min { θ 1 ( x 1 ) + θ 2 ( x 2 ) | A 1 x 1 + A 2 x 2 = b , x 1 ∈ X 1 , x 2 ∈ X 2 } , where A 1 ∈ ℜ m × n 1 , A 2 ∈ ℜ m × n 2 , b ∈ ℜ m , X i ⊆ ℜ n i ( i = 1 , 2) and θ i : ℜ n i → ℜ ( i = 1 , 2). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 9 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity A Separable Model For many applications, the last model can be specified as a separable form min { θ 1 ( x 1 ) + θ 2 ( x 2 ) | A 1 x 1 + A 2 x 2 = b , x 1 ∈ X 1 , x 2 ∈ X 2 } , where A 1 ∈ ℜ m × n 1 , A 2 ∈ ℜ m × n 2 , b ∈ ℜ m , X i ⊆ ℜ n i ( i = 1 , 2) and θ i : ℜ n i → ℜ ( i = 1 , 2). This model corresponds to the last model with θ ( x ) = θ 1 ( x 1 ) + θ 2 ( x 2 ) , x = ( x 1 , x 2 ) , A = ( A 1 , A 2 ) , X = X 1 × X 2 and n = n 1 + n 2 . Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 9 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity A Separable Model For many applications, the last model can be specified as a separable form min { θ 1 ( x 1 ) + θ 2 ( x 2 ) | A 1 x 1 + A 2 x 2 = b , x 1 ∈ X 1 , x 2 ∈ X 2 } , where A 1 ∈ ℜ m × n 1 , A 2 ∈ ℜ m × n 2 , b ∈ ℜ m , X i ⊆ ℜ n i ( i = 1 , 2) and θ i : ℜ n i → ℜ ( i = 1 , 2). This model corresponds to the last model with θ ( x ) = θ 1 ( x 1 ) + θ 2 ( x 2 ) , x = ( x 1 , x 2 ) , A = ( A 1 , A 2 ) , X = X 1 × X 2 and n = n 1 + n 2 . A typical application of the widely-used l 1 - l 2 model min { µ � x � 1 + 1 2 � Ax − b � 2 } 2 � Ax − b � 2 represents a data-fidelity where the least-square term 1 term and the l 1 -norm term � x � 1 is a regularization term for inducing spare solutions, and µ > 0 is a trade-off parameter. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 9 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Using ALM Directly with 100 % Accuracy Applying ALM directly: ( x k + 1 , x k + 1 θ 1 ( x 1 ) + θ 2 ( x 2 ) − ( λ k ) T ( A 1 x 1 + A 2 x 2 − b ) + β � 2 � A 1 x 1 + A 2 x 2 − b � 2 � � ( x 1 , x 2 ) ∈ X 1 × X 2 )= arg min � � 1 2 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ); 2 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 10 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Using ALM Directly with 100 % Accuracy Applying ALM directly: ( x k + 1 , x k + 1 θ 1 ( x 1 ) + θ 2 ( x 2 ) − ( λ k ) T ( A 1 x 1 + A 2 x 2 − b ) + β � 2 � A 1 x 1 + A 2 x 2 − b � 2 � � ( x 1 , x 2 ) ∈ X 1 × X 2 )= arg min � � 1 2 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ); 2 How about its implementability? Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 10 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Using ALM Directly with 100 % Accuracy Applying ALM directly: ( x k + 1 , x k + 1 θ 1 ( x 1 ) + θ 2 ( x 2 ) − ( λ k ) T ( A 1 x 1 + A 2 x 2 − b ) + β � 2 � A 1 x 1 + A 2 x 2 − b � 2 � � ( x 1 , x 2 ) ∈ X 1 × X 2 )= arg min � � 1 2 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ); 2 How about its implementability? Is it easy to solve the ALM subproblem exactly? Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 10 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Splitting the ALM with Less Accuracy? Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Splitting the ALM with Less Accuracy? Parallel (Jacobian) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 = arg min � θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � , 2 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Splitting the ALM with Less Accuracy? Parallel (Jacobian) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 = arg min � θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � , 2 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Sequential (Gauss-Seidel) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � � = arg min , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Splitting the ALM with Less Accuracy? Parallel (Jacobian) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 = arg min � θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � , 2 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Sequential (Gauss-Seidel) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � � = arg min , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Both lose accuracy but gain implementability — less accurate but more implementable cases compared to the original ALM. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Splitting the ALM with Less Accuracy? Parallel (Jacobian) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 = arg min � θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � , 2 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Sequential (Gauss-Seidel) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � � = arg min , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Both lose accuracy but gain implementability — less accurate but more implementable cases compared to the original ALM. They are equally implementable, and Sequential Splitting is more accurate. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Splitting the ALM with Less Accuracy? Parallel (Jacobian) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 = arg min � θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � , 2 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Sequential (Gauss-Seidel) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � � = arg min , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Both lose accuracy but gain implementability — less accurate but more implementable cases compared to the original ALM. They are equally implementable, and Sequential Splitting is more accurate. Parallel Splitting is not convergent (He/Hou/Y, 2013). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Splitting the ALM with Less Accuracy? Parallel (Jacobian) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 = arg min � θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � , 2 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Sequential (Gauss-Seidel) Splitting: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � � = arg min , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 Both lose accuracy but gain implementability — less accurate but more implementable cases compared to the original ALM. They are equally implementable, and Sequential Splitting is more accurate. Parallel Splitting is not convergent (He/Hou/Y, 2013). Sequential Splitting is convergent — the Alternating Direction Method of Multipliers (ADMM) originally proposed by R. Glowinski and Marrocco in 1975. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 11 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Comments on ADMM The ADMM scheme: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � � = arg min , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 ADMM represents an inexact version of ALM, because the ( x 1 , x 2 ) -subproblem in ALM is decomposed into two smaller ones. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 12 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Comments on ADMM The ADMM scheme: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � � = arg min , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 ADMM represents an inexact version of ALM, because the ( x 1 , x 2 ) -subproblem in ALM is decomposed into two smaller ones. It is possible to take advantage of the properties of θ 1 and θ 2 individually — the decomposed subproblems are potentially much easier than the aggregated subproblem in (the original subproblem of) ALM. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 12 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Comments on ADMM The ADMM scheme: x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � � = arg min , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 ADMM represents an inexact version of ALM, because the ( x 1 , x 2 ) -subproblem in ALM is decomposed into two smaller ones. It is possible to take advantage of the properties of θ 1 and θ 2 individually — the decomposed subproblems are potentially much easier than the aggregated subproblem in (the original subproblem of) ALM. For the mentioned l 1 - l 2 model, all subproblems are even easy enough to have closed-form solutions (to be delineated). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 12 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Cont’d A “renaissance" of ADMM in many application domains such as image processing, statistical learning, computer vision, and so on. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 13 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Cont’d A “renaissance" of ADMM in many application domains such as image processing, statistical learning, computer vision, and so on. In 2011, we proved ADMM’s convergence rate. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 13 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Cont’d A “renaissance" of ADMM in many application domains such as image processing, statistical learning, computer vision, and so on. In 2011, we proved ADMM’s convergence rate. Review papers: Boyd et al. 2010, Glowinski 2012, Eckstein and Yao 2012. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 13 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Accuracy of ADMM Certainly, acquiring implementability does not mean no care about the accuracy. 1 Ng/Wang/Y., Inexact alternating direction methods for image recovery, SIAM Journal on Scientific Computing, 33(4), 1643-1668, 2011. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 14 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Accuracy of ADMM Certainly, acquiring implementability does not mean no care about the accuracy. The accuracy of ADMM’s subproblems should be considered seriously. x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � ≈ arg min , 1 x k + 1 ≈ arg min � θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 1 Ng/Wang/Y., Inexact alternating direction methods for image recovery, SIAM Journal on Scientific Computing, 33(4), 1643-1668, 2011. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 14 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Accuracy of ADMM Certainly, acquiring implementability does not mean no care about the accuracy. The accuracy of ADMM’s subproblems should be considered seriously. x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � ≈ arg min , 1 x k + 1 ≈ arg min � θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 How to define “ ≈ " rigorously above? 1 Ng/Wang/Y., Inexact alternating direction methods for image recovery, SIAM Journal on Scientific Computing, 33(4), 1643-1668, 2011. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 14 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Accuracy of ADMM Certainly, acquiring implementability does not mean no care about the accuracy. The accuracy of ADMM’s subproblems should be considered seriously. x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 − b � 2 | x 1 ∈ X 1 � � ≈ arg min , 1 x k + 1 ≈ arg min � θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 2 � A 1 x k + 1 + A 2 x 2 − b � 2 | x 2 ∈ X 2 � , 2 1 λ k + 1 = λ k − β ( A 1 x k + 1 + A 2 x k + 1 − b ) . 1 2 How to define “ ≈ " rigorously above? For a general case, we need to analyze rigorously the inexactness criterion for solving these subproblems 1 . 1 Ng/Wang/Y., Inexact alternating direction methods for image recovery, SIAM Journal on Scientific Computing, 33(4), 1643-1668, 2011. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 14 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Two ADMM Applications (1) Compressive Sensing (Donoho, Candes, Tao, · · · ) Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 15 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Two ADMM Applications (1) Compressive Sensing (Donoho, Candes, Tao, · · · ) Allowing us to go beyond the Shannon limit to exploit the sparsity of a signal. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 15 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Two ADMM Applications (1) Compressive Sensing (Donoho, Candes, Tao, · · · ) Allowing us to go beyond the Shannon limit to exploit the sparsity of a signal. Acquiring important information of a signal efficiently (e.g., storage-saving, speed-improving). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 15 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Two ADMM Applications (1) Compressive Sensing (Donoho, Candes, Tao, · · · ) Allowing us to go beyond the Shannon limit to exploit the sparsity of a signal. Acquiring important information of a signal efficiently (e.g., storage-saving, speed-improving). compressive equipment original signal observation Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 15 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Two ADMM Applications (1) Compressive Sensing (Donoho, Candes, Tao, · · · ) Allowing us to go beyond the Shannon limit to exploit the sparsity of a signal. Acquiring important information of a signal efficiently (e.g., storage-saving, speed-improving). compressive equipment original signal observation Ideal model: Ax = b x — original signal, A — sensing matrix (a fat matrix), b — observation (with noise) Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 15 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity The Sparsity of a Signal Some signals are large-scale but sparse (maybe under some transform domain) 2 0 −2 0 100 200 300 400 500 600 700 800 900 1000 2 0 −2 0 100 200 300 400 500 600 700 800 900 1000 2 0 −2 0 2 4 6 8 10 12 14 16 2 0 −2 0 500 1000 1500 2000 2500 3000 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 16 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Mathematical Model Find a sparse solution of a system of linear equations � x � 0 | Ax = b , x ∈ R n � � min , where � x � 0 = number of nonzeros of x and A ∈ R m × n with m ≪ n . Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 17 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Mathematical Model Find a sparse solution of a system of linear equations � x � 0 | Ax = b , x ∈ R n � � min , where � x � 0 = number of nonzeros of x and A ∈ R m × n with m ≪ n . The solution is in general not unique. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 17 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Mathematical Model Find a sparse solution of a system of linear equations � x � 0 | Ax = b , x ∈ R n � � min , where � x � 0 = number of nonzeros of x and A ∈ R m × n with m ≪ n . The solution is in general not unique. It is NP-hard! Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 17 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Basic Models for Compressive Sensing Basis-pursuit (BP): min {� x � 1 | Ax = b } Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 18 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Basic Models for Compressive Sensing Basis-pursuit (BP): min {� x � 1 | Ax = b } l 1 -regularized least-squares model: min τ � x � 1 + 1 2 � Ax − b � 2 2 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 18 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity A Reformulation of the l 1 − l 2 Model τ � x � 1 + 1 2 � Ax − b � 2 min 2 x � � By introducing y � τ � x � 1 + 1 2 � Ay − b � 2 min 2 x = y . s.t. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 19 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Solutions of ADMM’s Subproblems τ � x � 1 + 1 2 � Ay − b � 2 min 2 x = y . s.t. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 20 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Solutions of ADMM’s Subproblems τ � x � 1 + 1 2 � Ay − b � 2 min 2 x = y . s.t. 2 � � x − y k − λ k � x k + 1 = arg min x ∈R n τ � x � 1 + β 2 ; 1 � � 2 β � y k + 1 : ( β I + A T A ) y = A T b + β x k + 1 − λ k ; 2 λ k + 1 = λ k − β x k + 1 − y k + 1 � � 3 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 20 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Solutions of ADMM’s Subproblems τ � x � 1 + 1 2 � Ay − b � 2 min 2 x = y . s.t. 2 � � x − y k − λ k � x k + 1 = arg min x ∈R n τ � x � 1 + β 2 ; 1 � � 2 β � y k + 1 : ( β I + A T A ) y = A T b + β x k + 1 − λ k ; 2 λ k + 1 = λ k − β x k + 1 − y k + 1 � � 3 P1 is a soft-shrinkage operator P2 is a system of linear equations, efficient solvers (e.g. PCG or BB) are available Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 20 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Another ADMM Application (2) Image deblurring A clean image could be degraded by blur — defocus of the camera’s lens, the moving object, turbulence in the air, · · · Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 21 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Another ADMM Application (2) Image deblurring A clean image could be degraded by blur — defocus of the camera’s lens, the moving object, turbulence in the air, · · · min �|∇ x |� 1 + µ 2 � Kx − x 0 � 2 , where x is the clean image, x 0 is the corrupted image by Gaussian noise, K is the point spread function (blur), ∇ is a gradient operator (by Rudin/Osher/Fatemi, 92’) to preserve sharp edges of an image, and µ is a trade-off parameter. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 21 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Another ADMM Application (2) Image deblurring A clean image could be degraded by blur — defocus of the camera’s lens, the moving object, turbulence in the air, · · · min �|∇ x |� 1 + µ 2 � Kx − x 0 � 2 , where x is the clean image, x 0 is the corrupted image by Gaussian noise, K is the point spread function (blur), ∇ is a gradient operator (by Rudin/Osher/Fatemi, 92’) to preserve sharp edges of an image, and µ is a trade-off parameter. original image blurred image restored image Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 21 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Applying ADMM Reformulate it as min �| y |� 1 + µ 2 � Kx − x 0 � 2 s.t. ∇ x = y , to which ADMM is applicable. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 22 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Applying ADMM Reformulate it as min �| y |� 1 + µ 2 � Kx − x 0 � 2 s.t. ∇ x = y , to which ADMM is applicable. The resulting subproblems are easy. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 22 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Applying ADMM Reformulate it as min �| y |� 1 + µ 2 � Kx − x 0 � 2 s.t. ∇ x = y , to which ADMM is applicable. The resulting subproblems are easy. The x -subproblem (via a DFT): � µ � 2 � Kx − x 0 � 2 − ( λ k ) T ( ∇ x − y k ) + β x k = arg min 2 �∇ x − y k � 2 ˜ . x Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 22 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Applying ADMM Reformulate it as min �| y |� 1 + µ 2 � Kx − x 0 � 2 s.t. ∇ x = y , to which ADMM is applicable. The resulting subproblems are easy. The x -subproblem (via a DFT): � µ � 2 � Kx − x 0 � 2 − ( λ k ) T ( ∇ x − y k ) + β x k = arg min 2 �∇ x − y k � 2 ˜ . x The y -subproblem (via a shrinkage): � � �| y |� 1 − ( λ k + 1 ) T ( ∇ x k + 1 − y ) + β y k = arg min 2 �∇ x k + 1 − y � 2 ˜ . y Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 22 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Image Inpainting Problem: Some pixels are missing in image. Partial information of image is available g = S f , S — mask Model: min {�∇ f � 1 | S f = g } original image missing pixel image restored image Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 23 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Image Decomposition Problem: Separate the sketch (cartoon) and oscillating component (texture) of image f = u + v , u — cartoon part, v — texture part τ �∇ u � 1 + � v � − 1 , ∞ | u + v = f � � Model: min original image cartoon part texture part Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 24 / 37
Accuracy v.s. Implementability – An Easier Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Magnetic Resonance Imaging (MRI) Problem: Reconstruct a medical image by sampling its Fourier coefficients partially F g = P F f , P — sampling mask, F — Fourier transform Model: min {�∇ f � 1 | F g = P F f } medical image sampling mask reconstruction Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 25 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Outline Backgrounds 1 Accuracy v.s. Implementability – An Easier Case 2 Accuracy v.s. Implementability – A More Complicated Case 3 Conclusions 4 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 26 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity A More Complicated Model with Higher Degree of Separability A more complicated multi-block separable convex optimization model: m m � � θ i ( x i ) � A i x i = b , x i ∈ X i , i = 1 , 2 , · · · , m min , � � i = 1 i = 1 with m ≥ 3. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 27 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity A More Complicated Model with Higher Degree of Separability A more complicated multi-block separable convex optimization model: m m � � θ i ( x i ) � A i x i = b , x i ∈ X i , i = 1 , 2 , · · · , m min , � � i = 1 i = 1 with m ≥ 3. Applications include Image alignment problem The robust principal component analysis model with noisy and incomplete data The latent variable Gaussian graphical model selection The quadratic discriminant analysis model Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 27 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Splitting Versions with Less Accuracy while More Implementability Obviously, the parallel (Jacobian) splitting: m x k + 1 = argmin { θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β A j x k j − b � 2 | x 1 ∈ X 1 } , 2 � A 1 x 1 + � 1 j = 2 · · · · · · i − 1 m x k + 1 = argmin { θ i ( x i ) − ( λ k ) T ( A i x i ) + β A j x k A j x k j − b � 2 | x i ∈ X i } , � j + A i x i + � 2 � i j = 1 j = i + 1 · · · · · · m − 1 A j x k x k + 1 = argmin { θ m ( x m ) − ( λ k ) T ( A m x m ) + β j + A m x m − b � 2 | x m ∈ X m } , � 2 � m j = 1 m λ k + 1 = λ k − β ( A i x k + 1 − b ) . � i i = 1 does not work (more details are coming). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 28 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Cont’d Can we extend ADMM straightforwardly (by splitting ALM into m subproblems sequentially)? m x k + 1 = argmin { θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β A j x k j − b � 2 | x 1 ∈ X 1 } , 2 � A 1 x 1 + � 1 j = 2 · · · · · · i − 1 m x k + 1 = argmin { θ i ( x i ) − ( λ k ) T ( A i x i ) + β A j x k + 1 A j x k j − b � 2 | x i ∈ X i } , + A i x i + � � 2 � i j j = 1 j = i + 1 · · · · · · m − 1 A j x k + 1 x k + 1 = argmin { θ m ( x m ) − ( λ k ) T ( A m x m ) + β + A m x m − b � 2 | x m ∈ X m } , � 2 � m j j = 1 m λ k + 1 = λ k − β ( A i x k + 1 − b ) . � i i = 1 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 29 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Cont’d Can we extend ADMM straightforwardly (by splitting ALM into m subproblems sequentially)? m x k + 1 = argmin { θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β A j x k j − b � 2 | x 1 ∈ X 1 } , 2 � A 1 x 1 + � 1 j = 2 · · · · · · i − 1 m x k + 1 = argmin { θ i ( x i ) − ( λ k ) T ( A i x i ) + β A j x k + 1 A j x k j − b � 2 | x i ∈ X i } , + A i x i + � � 2 � i j j = 1 j = i + 1 · · · · · · m − 1 A j x k + 1 x k + 1 = argmin { θ m ( x m ) − ( λ k ) T ( A m x m ) + β + A m x m − b � 2 | x m ∈ X m } , � 2 � m j j = 1 m λ k + 1 = λ k − β ( A i x k + 1 − b ) . � i i = 1 This direct extension of the ADMM has been widely used in the literature; and it does work very well for many applications! Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 29 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Cont’d Can we extend ADMM straightforwardly (by splitting ALM into m subproblems sequentially)? m x k + 1 = argmin { θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β A j x k j − b � 2 | x 1 ∈ X 1 } , 2 � A 1 x 1 + � 1 j = 2 · · · · · · i − 1 m x k + 1 = argmin { θ i ( x i ) − ( λ k ) T ( A i x i ) + β A j x k + 1 A j x k j − b � 2 | x i ∈ X i } , + A i x i + � � 2 � i j j = 1 j = i + 1 · · · · · · m − 1 A j x k + 1 x k + 1 = argmin { θ m ( x m ) − ( λ k ) T ( A m x m ) + β + A m x m − b � 2 | x m ∈ X m } , � 2 � m j j = 1 m λ k + 1 = λ k − β ( A i x k + 1 − b ) . � i i = 1 This direct extension of the ADMM has been widely used in the literature; and it does work very well for many applications! But for a very long time, neither affirmative convergence proof nor counter example showing its divergence was available. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 29 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Recently we 2 found some examples showing the divergence of the direct extension of ADMM even when m = 3. So, the direct extension of ADMM for multi-block separable convex optimization model is not necessarily convergent! 2 Chen/He/Ye/Y., The direct extension of ADMM for multi-block separable convex minimization models is not necessarily convergent, September 2013. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 30 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Recently we 2 found some examples showing the divergence of the direct extension of ADMM even when m = 3. So, the direct extension of ADMM for multi-block separable convex optimization model is not necessarily convergent! That is, even to solve � � � θ 1 ( x 1 ) + θ 2 ( x 2 ) + θ 3 ( x 3 ) � A 1 x 1 + A 2 x 2 + A 3 x 3 = b , x i ∈ X i , i = 1 , 2 , 3 min � , 2 Chen/He/Ye/Y., The direct extension of ADMM for multi-block separable convex minimization models is not necessarily convergent, September 2013. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 30 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Recently we 2 found some examples showing the divergence of the direct extension of ADMM even when m = 3. So, the direct extension of ADMM for multi-block separable convex optimization model is not necessarily convergent! That is, even to solve � � � θ 1 ( x 1 ) + θ 2 ( x 2 ) + θ 3 ( x 3 ) � A 1 x 1 + A 2 x 2 + A 3 x 3 = b , x i ∈ X i , i = 1 , 2 , 3 min � , the following scheme is not necessarily convergent: x k + 1 2 � A 1 x 1 + A 2 x k 2 + A 3 x k = argmin { θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 3 − b � 2 | x 1 ∈ X 1 } , 1 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k x k + 1 = argmin { θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 3 − b � 2 | x 2 ∈ X 2 } , 1 2 2 � A 1 x k + 1 + A 2 x k + 1 x k + 1 = argmin { θ 3 ( x 3 ) − ( λ k ) T ( A 3 x 3 ) + β + A 3 x 3 − b � 2 | x 3 ∈ X 3 } , 1 2 3 λ k + 1 = λ k − β ( A ‘ x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . ‘ 2 3 2 Chen/He/Ye/Y., The direct extension of ADMM for multi-block separable convex minimization models is not necessarily convergent, September 2013. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 30 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity Recently we 2 found some examples showing the divergence of the direct extension of ADMM even when m = 3. So, the direct extension of ADMM for multi-block separable convex optimization model is not necessarily convergent! That is, even to solve � � � θ 1 ( x 1 ) + θ 2 ( x 2 ) + θ 3 ( x 3 ) � A 1 x 1 + A 2 x 2 + A 3 x 3 = b , x i ∈ X i , i = 1 , 2 , 3 min � , the following scheme is not necessarily convergent: x k + 1 2 � A 1 x 1 + A 2 x k 2 + A 3 x k = argmin { θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 3 − b � 2 | x 1 ∈ X 1 } , 1 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k x k + 1 = argmin { θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 ) + β 3 − b � 2 | x 2 ∈ X 2 } , 1 2 2 � A 1 x k + 1 + A 2 x k + 1 x k + 1 = argmin { θ 3 ( x 3 ) − ( λ k ) T ( A 3 x 3 ) + β + A 3 x 3 − b � 2 | x 3 ∈ X 3 } , 1 2 3 λ k + 1 = λ k − β ( A ‘ x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . ‘ 2 3 Both Jacobian and Gauss-Seidel decompositions fail — too much loss of accuracy for m ≥ 3! 2 Chen/He/Ye/Y., The direct extension of ADMM for multi-block separable convex minimization models is not necessarily convergent, September 2013. Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 30 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity One Way of Applying the ADMM Conceptually, we can treat the multi-block model as a two-block model � θ 1 ( x 1 ) + θ 2 ( x 2 ) + θ 3 ( x 3 ) � A 1 x 1 + A 2 x 2 + A 3 x 3 = b , x i ∈ X i , i = 1 , 2 , 3 � � min � , Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 31 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity One Way of Applying the ADMM Conceptually, we can treat the multi-block model as a two-block model � θ 1 ( x 1 ) + θ 2 ( x 2 ) + θ 3 ( x 3 ) � � A 1 x 1 + A 2 x 2 + A 3 x 3 = b , x i ∈ X i , i = 1 , 2 , 3 � min � , Then, apply the original ADMM (for the two-block case) x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 + A 3 x k � x 1 ∈ X 1 3 − b � 2 � � � � = arg min , 1 θ 2 ( x 2 ) + θ 3 ( x 3 ) − ( λ k ) T ( A 2 x 2 + A 3 x 3 − b ) � � ( x k + 1 , x k + 1 )= arg min , 2 � A 1 x k + 1 � x 2 ∈ X 2 , x 3 ∈ X 3 + β + A 2 x 2 + A 3 x 3 − b � 2 � 2 3 1 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 31 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity One Way of Applying the ADMM Conceptually, we can treat the multi-block model as a two-block model � θ 1 ( x 1 ) + θ 2 ( x 2 ) + θ 3 ( x 3 ) � A 1 x 1 + A 2 x 2 + A 3 x 3 = b , x i ∈ X i , i = 1 , 2 , 3 � � min � , Then, apply the original ADMM (for the two-block case) x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 + A 3 x k � x 1 ∈ X 1 3 − b � 2 � � � � = arg min , 1 θ 2 ( x 2 ) + θ 3 ( x 3 ) − ( λ k ) T ( A 2 x 2 + A 3 x 3 − b ) � � ( x k + 1 , x k + 1 )= arg min , 2 � A 1 x k + 1 � x 2 ∈ X 2 , x 3 ∈ X 3 + β + A 2 x 2 + A 3 x 3 − b � 2 � 2 3 1 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 It is accurate (recall ADMM’s convergence). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 31 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity One Way of Applying the ADMM Conceptually, we can treat the multi-block model as a two-block model � θ 1 ( x 1 ) + θ 2 ( x 2 ) + θ 3 ( x 3 ) � A 1 x 1 + A 2 x 2 + A 3 x 3 = b , x i ∈ X i , i = 1 , 2 , 3 � � min � , Then, apply the original ADMM (for the two-block case) x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 + A 3 x k � x 1 ∈ X 1 3 − b � 2 � � � � = arg min , 1 θ 2 ( x 2 ) + θ 3 ( x 3 ) − ( λ k ) T ( A 2 x 2 + A 3 x 3 − b ) � � ( x k + 1 , x k + 1 )= arg min , 2 � A 1 x k + 1 � x 2 ∈ X 2 , x 3 ∈ X 3 + β + A 2 x 2 + A 3 x 3 − b � 2 � 2 3 1 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 It is accurate (recall ADMM’s convergence). But it is not implementable (hard to solve the ( x 2 , x 3 ) -subproblem). Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 31 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity ADMM with Further Splitting Split the ( x 2 , x 3 ) -subproblem in parallel x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 + A 3 x k � x 1 ∈ X 1 3 − b � 2 � = arg min � � , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � = arg min � � , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k 2 � A 1 x k + 1 + A 2 x k � x 3 ∈ X 3 2 + A 3 x 3 − b ) + β 2 + A 3 x 3 − b � 2 � = arg min � � , 3 1 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity ADMM with Further Splitting Split the ( x 2 , x 3 ) -subproblem in parallel x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 + A 3 x k � x 1 ∈ X 1 3 − b � 2 � = arg min � � , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � = arg min � � , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k 2 � A 1 x k + 1 + A 2 x k � x 3 ∈ X 3 2 + A 3 x 3 − b ) + β 2 + A 3 x 3 − b � 2 � = arg min � � , 3 1 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Split the ( x 2 , x 3 ) -subproblem sequentially 2 � A 1 x 1 + A 2 x k 2 + A 3 x k x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β � x 1 ∈ X 1 3 − b � 2 � � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � � � = arg min , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k + 1 + A 2 x k + 1 2 � A 1 x k + 1 � x 3 ∈ X 3 + A 3 x 3 − b ) + β + A 3 x 3 − b � 2 � � � = arg min , 3 2 1 2 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity ADMM with Further Splitting Split the ( x 2 , x 3 ) -subproblem in parallel x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 + A 3 x k � x 1 ∈ X 1 3 − b � 2 � = arg min � � , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � = arg min � � , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k 2 � A 1 x k + 1 + A 2 x k � x 3 ∈ X 3 2 + A 3 x 3 − b ) + β 2 + A 3 x 3 − b � 2 � = arg min � � , 3 1 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Split the ( x 2 , x 3 ) -subproblem sequentially 2 � A 1 x 1 + A 2 x k 2 + A 3 x k x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β � x 1 ∈ X 1 3 − b � 2 � � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � � � = arg min , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k + 1 + A 2 x k + 1 2 � A 1 x k + 1 � x 3 ∈ X 3 + A 3 x 3 − b ) + β + A 3 x 3 − b � 2 � � � = arg min , 3 2 1 2 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Both are implementable, Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity ADMM with Further Splitting Split the ( x 2 , x 3 ) -subproblem in parallel x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 + A 3 x k � x 1 ∈ X 1 3 − b � 2 � = arg min � � , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � = arg min � � , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k 2 � A 1 x k + 1 + A 2 x k � x 3 ∈ X 3 2 + A 3 x 3 − b ) + β 2 + A 3 x 3 − b � 2 � = arg min � � , 3 1 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Split the ( x 2 , x 3 ) -subproblem sequentially 2 � A 1 x 1 + A 2 x k 2 + A 3 x k x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β � x 1 ∈ X 1 3 − b � 2 � � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � � � = arg min , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k + 1 + A 2 x k + 1 2 � A 1 x k + 1 � x 3 ∈ X 3 + A 3 x 3 − b ) + β + A 3 x 3 − b � 2 � � � = arg min , 3 2 1 2 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Both are implementable, but how about the accuracy? Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity ADMM with Further Splitting Split the ( x 2 , x 3 ) -subproblem in parallel x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 + A 3 x k � x 1 ∈ X 1 3 − b � 2 � = arg min � � , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � = arg min � � , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k 2 � A 1 x k + 1 + A 2 x k � x 3 ∈ X 3 2 + A 3 x 3 − b ) + β 2 + A 3 x 3 − b � 2 � = arg min � � , 3 1 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Split the ( x 2 , x 3 ) -subproblem sequentially 2 � A 1 x 1 + A 2 x k 2 + A 3 x k x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β � x 1 ∈ X 1 3 − b � 2 � � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � � � = arg min , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k + 1 + A 2 x k + 1 2 � A 1 x k + 1 � x 3 ∈ X 3 + A 3 x 3 − b ) + β + A 3 x 3 − b � 2 � � � = arg min , 3 2 1 2 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Both are implementable, but how about the accuracy? Both are not necessarily convergent (Liu/Lu/Y., in pending) Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37
Accuracy v.s. Implementability – A More Complicated Case 2014 Workshop on Optimization for Modern Computation, Peking Univesity ADMM with Further Splitting Split the ( x 2 , x 3 ) -subproblem in parallel x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β 2 � A 1 x 1 + A 2 x k 2 + A 3 x k � x 1 ∈ X 1 3 − b � 2 � = arg min � � , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � = arg min � � , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k 2 � A 1 x k + 1 + A 2 x k � x 3 ∈ X 3 2 + A 3 x 3 − b ) + β 2 + A 3 x 3 − b � 2 � = arg min � � , 3 1 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Split the ( x 2 , x 3 ) -subproblem sequentially 2 � A 1 x 1 + A 2 x k 2 + A 3 x k x k + 1 θ 1 ( x 1 ) − ( λ k ) T ( A 1 x 1 ) + β � x 1 ∈ X 1 3 − b � 2 � � � = arg min , 1 x k + 1 θ 2 ( x 2 ) − ( λ k ) T ( A 2 x 2 + A 3 x k 2 � A 1 x k + 1 + A 2 x 2 + A 3 x k � x 2 ∈ X 2 3 − b ) + β 3 − b � 2 � � � = arg min , 2 1 x k + 1 θ 3 ( x 3 ) − ( λ k ) T ( A 2 x k + 1 + A 2 x k + 1 2 � A 1 x k + 1 � x 3 ∈ X 3 + A 3 x 3 − b ) + β + A 3 x 3 − b � 2 � � � = arg min , 3 2 1 2 λ k + 1 = λ k − αβ ( A 1 x k + 1 + A 2 x k + 1 + A 3 x k + 1 − b ) . 1 2 3 Both are implementable, but how about the accuracy? Both are not necessarily convergent (Liu/Lu/Y., in pending) Implementable but not accurate! Xiaoming Yuan (HKBU) Accuracy v.s. Implementability in Optimization September 02, 2014 32 / 37
Recommend
More recommend