On the Equivalence of Inexact Proximal ALM and ADMM for a Class of - PowerPoint PPT Presentation

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming Defeng Sun Department of Applied Mathematics DIMACS Workshop on ADMM and Proximal Splitting Methods in Optimization June 13, 2018 Joint work with: Liang Chen (PolyU), Xudong Li (Princeton), and Kim-Chuan Toh (NUS) 1

The multi-block convex composite optimization problem � � | F ∗ y + G ∗ z = c min p ( y 1 ) + f ( y ) − � b, z � � �� y ∈Y ,z ∈Z � �� A ∗ w = c Φ( w ) w ∈W ◮ X , Z and Y i ( i = 1 , . . . , s ) : finite-dimensional real Hilbert spaces (with �· , ·� and � · � ), Y := Y 1 × · · · × Y s ◮ p : Y 1 → ( −∞ , + ∞ ] : (possibly nonsmooth) closed proper convex; f : Y → ( −∞ , + ∞ ) : continuously differentiable, convex, Lipschitz continuous gradient ◮ F ∗ and G ∗ : the adjoints of the given linear mappings F : X → Y and G : X → Z ; b ∈ Z , c ∈ X : the given data Too simple? It covers many important classes of convex optimization problems that are best solved in this (dual) form! 2

A quintessential example The convex composite quadratic programming (CCQP) � � � ψ ( x ) + 1 � A x = b min 2 � x, Q x � − � c, x � (1) x ◮ ψ : X → ( −∞ , + ∞ ] : closed proper convex ◮ Q : X → X : self-adjoint positive semidefinite linear operator The dual (minimization form): � � � ψ ∗ ( y 1 ) + 1 � y 1 + Q y 2 − A ∗ z = c min 2 � y 2 , Q y 2 � − � b, z � (2) y 1 ,y 2 ,z ψ ∗ is the conjugate of ψ , y 1 ∈ X , y 2 ∈ X , z ∈ Z ◮ Many problems are subsumed under the convex composite quadratic programming model (1) ◮ E.g., the important classes of convex quadratic programming (QP), the convex quadratic semidefinite programming (QSDP)... 3

Convex QSDP � 1 � � � � A E X = b E , A I X ≥ b I , X ∈ S n min 2 � X, Q X � − � C, X � + X ∈ S n ◮ S n : the space of n × n real symmetric matrices ◮ S n + : the closed convex cone of positive semidefinite matrices in S n ◮ Q : S n → S n : a positive semidefinite linear operator; C ∈ S n : the given data ◮ A E and A I : linear maps from S n to certain finite dimensional Euclidean spaces containing b E and b I , respectively QSDPNAL 1 : the first phase is an inexact block sGS decomposition based multi-block proximal ADMM, in which the generated solution is used as the initial point to warm-start the second phase algorithm 1 Li, Sun, Toh: QSDPNAL: A two-phase augmented Lagrangian method for convex quadratic semidefinite programming. MPC online (2018) 4

Penalized and constrained regression models The penalized and constrained regression often arises in high-dimensional generalized linear models with linear equality and inequality constraints, e.g., � 2 λ � Φ x − η � 2 � � p ( x ) + 1 � min � A E x = b E , A I x ≥ b I (3) x ∈ R n ◮ Φ ∈ R m × n , A E ∈ R r E × n , A I ∈ R r I × n , η ∈ R m , b E ∈ R r E and b I ∈ R r I are the given data ◮ p is a proper closed convex regularizer such as p ( x ) = � x � 1 ◮ λ > 0 is a parameter. ◮ Obviously, the dual of problem (3) is a particular case of CCQP 5

The augmented Lagrangian function 2 Consider � � | F ∗ y + G ∗ z = c min p ( y 1 ) + f ( y ) − � b, z � � �� y ∈Y ,z ∈Z � �� A ∗ w = c Φ( w ) w ∈W Let σ > 0 be the penalty parameter. The augmented Lagrangian function: L σ ( y, z ; x ) := p ( y 1 ) + f ( y ) − � b, z � � �� Φ( w ) + � x, F ∗ y + G ∗ z − c � + σ 2 �F ∗ y + G ∗ z − c � 2 , � �� x, A ∗ w − c � �A ∗ w − c � 2 ∀ w = ( y, z ) ∈ W := Y × Z , x ∈ X 2 Arrow, K.J., Solow, R.M.: Gradient methods for constrained maxima with weakened assumptions. In: Arrow, K.J., Hurwicz, L., Uzawa, H., (eds.) Studies in Linear and Nonlinear Programming. Stanford University Press, Stanford, pp. 165-176 (1958) 6

K. Arrow and R. Solow Kenneth Joseph ”Ken” Arrow (23 August 1921 – 21 February 2017) John Bates Clark Medal (1957); Nobel Prize in Eco- nomics (1972); von Neumann Theory Prize (1986); National Medal of Science (2004); ForMemRS (2006) Robert Merton Solow (August 23, 1924 – ) John Bates Clark Medal (1961); Nobel Memorial Prize in Economic Sciences (1987); National Medal of Sci- ence (1999); Presidential Medal of Freedom (2014); ForMemRS (2006) 7

The augmented Lagrangian method 3 (ALM) Starting from x 0 ∈ X , performs for k = 0 , 1 , . . . (1) ( y k +1 , z k +1 ) ; x k ) (approximately) ⇐ min y,z L σ ( y, z �� w w k +1 (2) x k +1 := x k + τσ ( F ∗ y k +1 + G ∗ z k +1 − c ) with τ ∈ (0 , 2) Magnus Rudolph Hestenes Michael James David Powell (February 13 1906 – May 31 1991) (29 July 1936 – 19 April 2015) 3 Also known as the method of multipliers 8

ALM and variants ◮ ALM has the desirable asymptotically superlinear convergence (or linearly convergent of an arbitrary order) for τ = 1 ◮ While one would really want to min y,z L σ ( y, z ; x k ) without modifying the augmented Lagrangian, it can be expensive due to the coupled quadratic term in y and z ◮ In practice, unless the ALM subproblems can be solved efficiently, one would generally want to replace the augmented Lagrangian subproblem with an easier-to-solve surrogate by modifying the augmented Lagrangian function to decouple the minimization with respect to y and z ◮ Especially desirable during the initial phase of the ALM when the local superlinear convergence phase of ALM has yet to kick in 9

ALM to proximal ALM 4 (PALM) Minimize the augmented Lagrangian function plus a quadratic proximal term : L σ ( w ; x k ) + 1 w k +1 ≈ arg min 2 � w − w k � 2 D w ◮ D = σ − 1 I in the seminal work of Rockafellar (in which inequality constraints are considered). Note that D → 0 as σ → ∞ , which is critical for superlinear convergence ◮ It is a primal-dual type proximal point algorithm (PPA) 4 Also known as the proximal method of multipliers 10

Modification and decomposition ◮ D could be positive semidefinite (a kind of PPAs), i.e., the obvious approach: D = σ ( λ 2 I − AA ∗ ) = σ ( λ 2 I − ( F ; G )( F ; G ) ∗ ) with λ being the largest singular value of ( F ; G ) ◮ This obvious choice is generally too drastic and has the undesirable effect of significantly slowing down the convergence of the PALM ◮ D can be indefinite (typically used together with the majorization technique) ? What is an appropriate proximal term to add so that ◮ ◮ The PALM subproblem is easier to solve ◮ Less drastic than the obvious choice 11

Decomposition based ADMM One the other hand, decomposition based approach is available, i.e, y k +1 ≈ arg min {L σ ( y, z k ; x k ) } , z k +1 ≈ arg min {L σ ( y k +1 , z ; x k ) } y z ◮ The two-block ADMM √ ◮ Allows τ ∈ (0 , (1 + 5) / 2) if the convergence of the full (primal & dual) sequence is required (Glowinski) ◮ The case with τ = 1 is a kind of PPA (Gabay + Bertsekas-Eckstein) ◮ Many variants (proximal/inexact/generalized/parallel etc.) 12

A part of the result An equivalent property: Add an appropriately designed proximal term to L σ ( y, z ; x k ) , we reduce the computation of the modified ALM subproblem to sequentially updating y and z without adding a proximal term, which is exactly the same as the two-block ADMM ◮ A difference : one can prove convergence for the step-length τ in the range (0 , 2) whereas the classic two-block ADMM only √ admits (0 , (1 + 5) / 2) 13

For multi-block problems Turn back to the multi-block problem, the subproblem to y can still be difficult due to the coupling of y 1 , . . . , y s ◮ A successful multi-block ADMM-type algorithm must not only possess convergence guarantee but also should numerically perform at least as fast as the directly extended ADMM (the Gauss-Seidel iterative fashion) when it does converge 14

Algorithmic design ◮ Majorize the function f ( y ) at y k with a quadratic function ◮ Add an extra proximal term that is derived based on the symmetric Gauss-Seidel (sGS) decomposition theorem [K.C. Toh’s talk on Monday] to update the sub-blocks in y individually and successively in an sGS fashion ◮ The resulting algorithm: A block sGS decomposition based (inexact) majorized multi-block indefinite proximal ADMM with τ ∈ (0 , 2) , which is equivalent to an inexact majorized proximal ALM 15

An inexact majorized indefinite proximal ALM Consider A ∗ w = c w ∈W Φ( w ) := ϕ ( w ) + h ( w ) min s.t. ◮ The Karush-Kuhn-Tucker (KKT) system: A ∗ w − c = 0 0 ∈ ∂ϕ ( w ) + ∇ h ( w ) + A x, ◮ The gradient of h is Lipschitz continuous, which implies a self-adjoint positive semidefinite linear operator � Σ h : W → W , such that for any w, w ′ ∈ W h ( w, w ′ ) := h ( w ′ ) + �∇ h ( w ′ ) , w − w ′ � + 1 h ( w ) ≤ ˆ 2 � w − w ′ � 2 � Σ h which is called a majorization of h at w ′ (e.g., the logistic loss function) 16

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of - PowerPoint PPT Presentation

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming Defeng Sun Department of Applied Mathematics DIMACS Workshop on ADMM and Proximal Splitting Methods in Optimization June 13, 2018 Joint work with:

EWG on Maritime Security Brunei Darussalam and New Zealand Co-Chairs 2014 2017 ADMM-PLUS EWG

ALM API for Topology Management ALM API for Topology Management and Network Layer Transparent

The Evolutjon of ALM June 2019 How will ALM change The Evolving External Increasing

Outline Exploring inexact rhyme in TradiQons in studying Russian rhyme Russian verse The

Models for Inexact Reasoning Models for Inexact Reasoning Reasoning with Certainty Factors: The

Inexact variable metric proximal gradient methods with line-search for convex and nonconvex

Complexity of a quadratic penalty accelerated inexact proximal point method W. Kong 1 J.G. Melo 2

On CCZ-Equivalence, Extended-Affine Equivalence and Function Twisting Anne Canteaut, L eo

Equivalence Relations {( a , b ) | a and b are from the the same state}. Observe that these

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

Inexact Tensor Methods with Dynamic Accuracies Nikita Doikov Yurii Nesterov UCLouvain, Belgium

On CCZ-Equivalence, Extended-Affine Equivalence and Function Twisting Anne Canteaut, Lo Perrin

Countable Borel equivalence relations, recursion theory, and Borel combinatorics Andrew Marks UC

7.5 EQUIVALENCE RELATIONS def: An equivalence relation is a binary rela- tion that is reflexive,

Global Inequality ECON 499: Economics of Inquality Winter 2018 1820 1929 1950 1960 1970

IN U.S. MANUFACTURING Daron Acemoglu David Autor David Dorn Gordon H. Hanson Brendan Price

Self-similar Attractors in Solow-type Public Debt Dynamics Generated by Iterated Function Systems

Senior Managers and Certification Regime Branko Bjelobaba FCII Regulation & Compliance

our container journey @beshippable shippable.com our container journey containers can

2012 Sixth Annual Global Leadership Survey Laura E. Wilker Head of Communications 2013 Chief

Background Multicast in Cooperative Environments Tree-based multicast High demand on few

Forward Fault Correction (FFC) Hongqiang Harry Liu , Srikanth Kandula, Ratul Mahajan, Ming

Sambuz

Useful Links

Newsletter

Mail Us

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of - PowerPoint PPT Presentation

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming Defeng Sun Department of Applied Mathematics DIMACS Workshop on ADMM and Proximal Splitting Methods in Optimization June 13, 2018 Joint work with:

EWG on Maritime Security Brunei Darussalam and New Zealand Co-Chairs 2014 2017 ADMM-PLUS EWG

ALM API for Topology Management ALM API for Topology Management and Network Layer Transparent

The Evolutjon of ALM June 2019 How will ALM change The Evolving External Increasing

Outline Exploring inexact rhyme in TradiQons in studying Russian rhyme Russian verse The

Models for Inexact Reasoning Models for Inexact Reasoning Reasoning with Certainty Factors: The

Inexact variable metric proximal gradient methods with line-search for convex and nonconvex

Complexity of a quadratic penalty accelerated inexact proximal point method W. Kong 1 J.G. Melo 2

On CCZ-Equivalence, Extended-Affine Equivalence and Function Twisting Anne Canteaut, L eo

Equivalence Relations {( a , b ) | a and b are from the the same state}. Observe that these

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

Inexact Tensor Methods with Dynamic Accuracies Nikita Doikov Yurii Nesterov UCLouvain, Belgium

On CCZ-Equivalence, Extended-Affine Equivalence and Function Twisting Anne Canteaut, Lo Perrin

Countable Borel equivalence relations, recursion theory, and Borel combinatorics Andrew Marks UC

7.5 EQUIVALENCE RELATIONS def: An equivalence relation is a binary rela- tion that is reflexive,

Global Inequality ECON 499: Economics of Inquality Winter 2018 1820 1929 1950 1960 1970

IN U.S. MANUFACTURING Daron Acemoglu David Autor David Dorn Gordon H. Hanson Brendan Price

Self-similar Attractors in Solow-type Public Debt Dynamics Generated by Iterated Function Systems

Senior Managers and Certification Regime Branko Bjelobaba FCII Regulation &amp; Compliance

our container journey @beshippable shippable.com our container journey containers can

2012 Sixth Annual Global Leadership Survey Laura E. Wilker Head of Communications 2013 Chief

Background Multicast in Cooperative Environments Tree-based multicast High demand on few

Forward Fault Correction (FFC) Hongqiang Harry Liu , Srikanth Kandula, Ratul Mahajan, Ming

Sambuz

Useful Links

Newsletter

Mail Us

Senior Managers and Certification Regime Branko Bjelobaba FCII Regulation & Compliance