Distributed nonsmooth composite optimization via the proximal augmented Lagrangian Neil K. Dhingra neilkdh.com joint work with Sei Zhen Khong Mihailo Jovanović LCCC Focus Period on Large-Scale and Distributed Optimization June 9, 2017 1 / 35
Applications satellite formations combination drug therapy power networks control of buildings 2 / 35
Structure via composite optimization minimize f ( x ) + g ( Tx ) − − ← ← performance structure ◮ f – possibly nonconvex; cts-differentiable ◮ g – convex; often non-differentiable ◮ Tx – promote structure in alternate coordinates ◮ g ( x ) admits easily computable proximal operator, g ( Tx ) does not 3 / 35
Outline I Proximal augmented Lagrangian - centralized approach – method of multipliers II Primal-dual method - distributable - convergence for convex problems - linear convergence for strongly convex problems 4 / 35
Proximal gradient method minimize f ( x ) + g ( x ) Generalizes gradient descent x k +1 = prox α k g � x k − α k ∇ f ( x k ) � - cannot be used for g ( Tx ) in general Nesterov ‘07 Beck & Teboulle ‘09 5 / 35
Proximal operator and Moreau envelope ◮ Proximal operator 1 2 µ � z − v � 2 prox µg ( v ) := argmin g ( z ) + z ◮ Moreau envelope 1 2 µ � z − v � 2 M µg ( v ) := inf g ( z ) + z - continuously differentiable even when g is not ∇ M µg ( v ) = 1 � � v − prox µg ( v ) µ Parikh & Boyd, FnT in Optimization ‘14 6 / 35
Example ◮ Soft-thresholding – proximal operator for ℓ 1 norm � � 1 � 2 µ ( z i − v i ) 2 minimize γ | z i | + z i i separability ⇒ element-wise analytical solution ∇ M prox operator Moreau envelope soft-thresholding Huber function saturation a = µγ 7 / 35
Auxiliary variable minimize f ( x ) + g ( z ) x, z subject to Tx − z = 0 ◮ Decouples f and g ◮ Can use methods for constrained optimization Augmented Lagrangian 1 2 µ � Tx − z � 2 L µ ( x, z ; y ) = f ( x ) + g ( z ) + � y, Tx − z � + 8 / 35
Method of multipliers ( x k +1 , z k +1 ) L µ ( x, z ; y k ) = argmin x,z y k + 1 µ ( Tx k +1 − z k +1 ) y k +1 = ◮ Gradient ascent on a strengthened dual problem ◮ Requires joint minimization over x and z ◮ Well-studied: convergence to local minimum, adaptive µ update, inexact subproblems, etc. 9 / 35
MM cartoon L µ ( x, z ; y 0 ) 10 / 35
MM cartoon L µ ( x, z ; y 0 ) 10 / 35
MM cartoon L µ ( x, z ; y 1 ) 10 / 35
MM cartoon L µ ( x, z ; y 1 ) 10 / 35
MM cartoon L µ ( x, z ; y ⋆ ) 10 / 35
MM cartoon L µ ( x, z ; y ⋆ ) 10 / 35
Alternating direction method of multipliers x k +1 L µ ( x, z k ; y k ) = argmin differentiable x z k +1 L µ ( x k +1 , z ; y k ) = argmin prox µg ( · ) z y k + 1 µ ( Tx k +1 − z k +1 ) y k +1 = ◮ Convenient for distributed implementation ◮ Convergence speed influenced by µ ◮ Challenge: convergence for nonconvex f Hong, Luo, Razaviyayn, SIAM J. Optimiz. ‘16 11 / 35
ADMM cartoon L µ ( x, z ; y 0 ) 12 / 35
ADMM cartoon L µ ( x, z ; y 0 ) 12 / 35
ADMM cartoon L µ ( x, z ; y 0 ) 12 / 35
ADMM cartoon L µ ( x, z ; y 1 ) 12 / 35
ADMM cartoon L µ ( x, z ; y 1 ) 12 / 35
ADMM cartoon L µ ( x, z ; y 1 ) 12 / 35
ADMM cartoon L µ ( x, z ; y 2 ) 12 / 35
ADMM cartoon L µ ( x, z ; y 2 ) 12 / 35
ADMM cartoon L µ ( x, z ; y 2 ) 12 / 35
Alternating direction method of multipliers x k +1 L µ ( x, z k ; y k ) = argmin differentiable x z k +1 L µ ( x k +1 , z ; y k ) = argmin prox µg ( · ) z y k + 1 µ ( Tx k +1 − z k +1 ) y k +1 = 13 / 35
Proximal augmented Lagrangian 1 − µ 2 µ � z − ( Tx + µy ) � 2 2 � y � 2 L µ ( x, z ; y ) = f ( x ) + g ( z ) + � �� � Minimize over z z ⋆ µ ( x, y ) = prox µg ( Tx + µy ) Evaluate L µ ( x, z ; y ) at z ⋆ L µ ( x, z ⋆ L µ ( x ; y ) := µ ( x, y ); y ) f ( x ) + M µg ( Tx + µy ) − µ 2 � y � 2 = continuously differentiable in x and y Dhingra, Khong, Jovanović, arXiv:1610.04514 14 / 35
Proximal augmented Lagrangian MM x k +1 L µ ( x ; y k ) = argmin x y k + 1 µ ( Tx k +1 − prox µg ( Tx k +1 + µy k )) y k +1 = ◮ Nonconvex f : convergence to local minimum ◮ x -minimization step: differentiable problem Dhingra, Khong, Jovanović, arXiv:1610.04514 15 / 35
Proximal augmented Lagrangian MM cartoon L µ ( x, z ; y 0 ) , L µ ( x ; y 0 ) 16 / 35
Proximal augmented Lagrangian MM cartoon L µ ( x, z ; y 0 ) , L µ ( x ; y 0 ) 16 / 35
Proximal augmented Lagrangian MM cartoon L µ ( x, z ; y 1 ) , L µ ( x ; y 1 ) 16 / 35
Proximal augmented Lagrangian MM cartoon L µ ( x, z ; y ⋆ ) , L µ ( x ; y ⋆ ) 16 / 35
Edge addition in directed consensus networks x 1 x 4 x 2 x 5 x 6 x 7 x 3 z are edges, columns of T are basis for space of balanced graphs Identify edges minimize f 2 ( x ) + γ � Tx � 1 x ( γ ) = x Design edge weights x ⋆ ( γ ) = minimize f 2 ( x ) x subject to sp( Tx ) ∈ sp( Tx ( γ )) 17 / 35
Edge addition in directed consensus networks percent performance loss number of added edges 18 / 35
Comparison with ADMM Comp. time ( s ) Outer iter. ( k ) m m Outer iterations per outer iteration - guaranteed convergence to local minimum - computational savings from reduced outer iterations Dhingra, Khong, Jovanović, arXiv:1610.04514 19 / 35
Outline I Proximal augmented Lagrangian - centralized approach – method of multipliers II Primal-dual method - distributable - convergence for convex problems - linear convergence for strongly convex problems 20 / 35
Primal-descent dual-ascent Arrow-Hurwicz-Uzawa type gradient flow � ˙ � −∇ x L � � x = y ˙ ∇ y L ◮ Existing methods use subgradients or projection ◮ Convenient for distributed implementation Arrow, Hurwicz, Uzawa, ‘59 Nedic & Ozdaglar, TAC ‘09 Wang & Elia, CDC ‘11 Feijer & Paganini, AUT ‘10 Cherukuri, Gharesifard, Cortés, SCL ‘15 21 / 35
First-order primal-dual method � ˙ � −∇ x L µ ( x ; y ) � � x = y ˙ ∇ y L µ ( x ; y ) ◮ Continuous rhs – even for non-differentiable g ( Tx ) - algorithmic implementation via forward Euler discretization ◮ Convex f – asymptotic convergence - Lyapunov function & LaSalle’s invariance principle ◮ Strongly cvx, Lip. cts gradient – linear convergence - Integral Quadratic Constraints - extends to discrete-time Dhingra, Khong, Jovanović, arXiv:1610.04514 22 / 35
Method of multipliers cartoon II L µ ( x ; y ) , min x L µ ( x ; y ) 23 / 35
Method of multipliers cartoon II L µ ( x ; y ) , min x L µ ( x ; y ) 23 / 35
Method of multipliers cartoon II x 1 = argmin L µ ( x ; y 0 ) , min x L µ ( x ; y ) x 23 / 35
Method of multipliers cartoon II y 1 = y 0 + 1 µ ∇ y L µ ( x 1 ; y 0 ) , min x L µ ( x ; y ) 23 / 35
Method of multipliers cartoon II x 2 = argmin L µ ( x ; y 1 ) , min x L µ ( x ; y ) x 23 / 35
Method of multipliers cartoon II y ⋆ = y 1 + 1 µ ∇ y L µ ( x 2 ; y 1 ) , min x L µ ( x ; y ) 23 / 35
Method of multipliers cartoon II x ⋆ = argmin L µ ( x ; y ⋆ ) , min x L µ ( x ; y ) x 23 / 35
Primal-dual cartoon ( x 1 , y 1 ) = ( x 0 , y 0 ) − α ( ∇ x L µ ( x 0 ; y 0 ) , −∇ y L µ ( x 0 ; y 0 )) , min x L µ ( x ; y ) 24 / 35
Primal-dual cartoon ( x 2 , y 2 ) = ( x 1 , y 1 ) − α ( ∇ x L µ ( x 1 ; y 1 ) , −∇ y L µ ( x 1 ; y 1 )) , min x L µ ( x ; y ) 24 / 35
Primal-dual cartoon ( x ⋆ , y ⋆ ) = ( x 2 , y 2 ) − α ( ∇ x L µ ( x 2 ; y 2 ) , −∇ y L µ ( x 2 ; y 2 )) , min x L µ ( x ; y ) 24 / 35
Distributed updates � ˙ � −∇ f ( x ) − T T ∇ M µg ( Tx + µy ) � � x = y ˙ µ ∇ M µg ( Tx + µy ) − µy ◮ Recall ∇ M µg ( v ) = 1 µ ( v − prox µg ( v )) ◮ Distributed implementation if g separable and - ∇ f : R n → R n is a sparse mapping - T T T is sparse 25 / 35
Distributed updates � ˙ � −∇ f ( x ) − T T ∇ M µg ( Tx + µy ) � � x = y ˙ µ ∇ M µg ( Tx + µy ) − µy ◮ Recall ∇ M µg ( v ) = 1 µ ( v − prox µg ( v )) ◮ Distributed implementation if g separable and - ∇ f : R n → R n is a sparse mapping - T T T is sparse ◮ Each node x i - communicates according to ∇ f and T T T - stores y i according to T T 25 / 35
Overlapping group LASSO example � 1 2 � Ax − b � 2 minimize 2 + � ( Tx ) i � 2 Gradient mapping: ∇ f ( x ) = A T ( Ax − b ) - communicate states x i according to ∇ f and T T T - store y i corresponding to red edges ⋆ x 1 ⋆ ⋆ ⋆ ⋆ x 2 x 4 � �� � A ⋆ ⋆ ⋆ ⋆ x 3 ⋆ ⋆ � �� � T 26 / 35
Reformulation of distributed optimization � minimize f i ( x i ) � minimize f i ( x ) ≡ x 1 ,x 2 ,... x subject to Tx = 0 ◮ T T is Laplacian or incidence matrix of connected network � ≡ minimize f i ( x i ) + I 0 ( Tx ) x 1 ,x 2 ,... � 0 , z = 0 Indicator function is I 0 ( z ) := ∞ , z � = 0 27 / 35
Recommend
More recommend