Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, - PowerPoint PPT Presentation

Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, Trento, 23/6/11 source: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein) 1

Goals robust methods for � arbitrary-scale optimization – machine learning/statistics with huge data-sets – dynamic optimization on large-scale network � decentralized optimization – devices/processors/agents coordinate to solve large problem, by passing relatively small messages 2

Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Dual decomposition 3

Dual problem � convex equality constrained optimization problem minimize f ( x ) subject to Ax = b � Lagrangian: L ( x, y ) = f ( x ) + y T ( Ax − b ) � dual function: g ( y ) = inf x L ( x, y ) � dual problem: maximize g ( y ) � recover x � = argmin x L ( x, y � ) Dual decomposition 4

Dual ascent � gradient method for dual problem: y k +1 = y k + α k ∇ g ( y k ) � ∇ g ( y k ) = A ˜ x = argmin x L ( x, y k ) x − b , where ˜ � dual ascent method is x k +1 argmin x L ( x, y k ) := // x -minimization y k + α k ( Ax k +1 − b ) y k +1 := // dual update � works, with lots of strong assumptions Dual decomposition 5

Dual decomposition � suppose f is separable: f ( x ) = f 1 ( x 1 ) + · · · + f N ( x N ) , x = ( x 1 , . . . , x N ) � then L is separable in x : L ( x, y ) = L 1 ( x 1 , y ) + · · · + L N ( x N , y ) − y T b , L i ( x i , y ) = f i ( x i ) + y T A i x i � x -minimization in dual ascent splits into N separate minimizations x k +1 L i ( x i , y k ) := argmin i x i which can be carried out in parallel Dual decomposition 6

Dual decomposition � dual decomposition (Everett, Dantzig, Wolfe, Benders 1960–65) x k +1 argmin x i L i ( x i , y k ) , := i = 1 , . . . , N i y k + α k ( � N y k +1 i =1 A i x k +1 := − b ) i � scatter y k ; update x i in parallel; gather A i x k +1 i � solve a large problem – by iteratively solving subproblems (in parallel) – dual variable update provides coordination � works, with lots of assumptions; often slow Dual decomposition 7

Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Method of multipliers 8

Method of multipliers � a method to robustify dual ascent � use augmented Lagrangian (Hestenes, Powell 1969), ρ > 0 L ρ ( x, y ) = f ( x ) + y T ( Ax − b ) + ( ρ/ 2) � Ax − b � 2 2 � method of multipliers (Hestenes, Powell; analysis in Bertsekas 1982) x k +1 L ρ ( x, y k ) := argmin x y k + ρ ( Ax k +1 − b ) y k +1 := (note specific dual update step length ρ ) Method of multipliers 9

Method of multipliers dual update step � optimality conditions (for differentiable f ): Ax � − b = 0 , ∇ f ( x � ) + A T y � = 0 (primal and dual feasibility) � since x k +1 minimizes L ρ ( x, y k ) ∇ x L ρ ( x k +1 , y k ) 0 = y k + ρ ( Ax k +1 − b ) ∇ x f ( x k +1 ) + A T � � = ∇ x f ( x k +1 ) + A T y k +1 = � dual update y k +1 = y k + ρ ( x k +1 − b ) makes ( x k +1 , y k +1 ) dual feasible � primal feasibility achieved in limit: Ax k +1 − b → 0 Method of multipliers 10

Method of multipliers (compared to dual decomposition) � good news : converges under much more relaxed conditions ( f can be nondifferentiable, take on value + ∞ , . . . ) � bad news : quadratic penalty destroys splitting of the x -update, so can’t do decomposition Method of multipliers 11

Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Alternating direction method of multipliers 12

Alternating direction method of multipliers � a method – with good robustness of method of multipliers – which can support decomposition � “robust dual decomposition” or “decomposable method of multipliers” � proposed by Gabay, Mercier, Glowinski, Marrocco in 1976 Alternating direction method of multipliers 13

Alternating direction method of multipliers � ADMM problem form (with f , g convex) minimize f ( x ) + g ( z ) subject to Ax + Bz = c – two sets of variables, with separable objective � L ρ ( x, z, y ) = f ( x ) + g ( z ) + y T ( Ax + Bz − c ) + ( ρ/ 2) � Ax + Bz − c � 2 2 � ADMM: x k +1 argmin x L ρ ( x, z k , y k ) := // x -minimization z k +1 argmin z L ρ ( x k +1 , z, y k ) := // z -minimization y k + ρ ( Ax k +1 + Bz k +1 − c ) y k +1 := // dual update Alternating direction method of multipliers 14

Alternating direction method of multipliers � if we minimized over x and z jointly, reduces to method of multipliers � instead, we do one pass of a Gauss-Seidel method � we get splitting since we minimize over x with z fixed, and vice versa Alternating direction method of multipliers 15

ADMM and optimality conditions � optimality conditions (for differentiable case): – primal feasibility: Ax + Bz − c = 0 – dual feasibility: ∇ f ( x ) + A T y = 0 , ∇ g ( z ) + B T y = 0 � since z k +1 minimizes L ρ ( x k +1 , z, y k ) we have ∇ g ( z k +1 ) + B T y k + ρB T ( Ax k +1 + Bz k +1 − c ) 0 = ∇ g ( z k +1 ) + B T y k +1 = � so with ADMM dual variable update, ( x k +1 , z k +1 , y k +1 ) satisfies second dual feasibility condition � primal and first dual feasibility are achieved as k → ∞ Alternating direction method of multipliers 16

ADMM with scaled dual variables � combine linear and quadratic terms in augmented Lagrangian f ( x ) + g ( z ) + y T ( Ax + Bz − c ) + ( ρ/ 2) � Ax + Bz − c � 2 L ρ ( x, z, y ) = 2 f ( x ) + g ( z ) + ( ρ/ 2) � Ax + Bz − c + u � 2 = 2 + const. with u k = (1 /ρ ) y k � ADMM (scaled dual form): f ( x ) + ( ρ/ 2) � Ax + Bz k − c + u k � 2 x k +1 � � := argmin 2 x g ( z ) + ( ρ/ 2) � Ax k +1 + Bz − c + u k � 2 z k +1 � � := argmin 2 z u k + ( Ax k +1 + Bz k +1 − c ) u k +1 := Alternating direction method of multipliers 17

Convergence � assume (very little !) – f , g convex, closed, proper – L 0 has a saddle point � then ADMM converges: – iterates approach feasibility: Ax k + Bz k − c → 0 – objective approaches optimal value: f ( x k ) + g ( z k ) → p � Alternating direction method of multipliers 18

Related algorithms � operator splitting methods (Douglas, Peaceman, Rachford, Lions, Mercier, . . . 1950s, 1979) � proximal point algorithm (Rockafellar 1976) � Dykstra’s alternating projections algorithm (1983) � Spingarn’s method of partial inverses (1985) � Rockafellar-Wets progressive hedging (1991) � proximal methods (Rockafellar, many others, 1976–present) � Bregman iterative methods (2008–present) � most of these are special cases of the proximal point algorithm Alternating direction method of multipliers 19

Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Common patterns 20

Common patterns � x -update step requires minimizing f ( x ) + ( ρ/ 2) � Ax − v � 2 2 (with v = Bz k − c + u k , which is constant during x -update) � similar for z -update � several special cases come up often � can simplify update by exploit structure in these cases Common patterns 21

Decomposition � suppose f is block-separable, f ( x ) = f 1 ( x 1 ) + · · · + f N ( x N ) , x = ( x 1 , . . . , x N ) � A is conformably block separable: A T A is block diagonal � then x -update splits into N parallel updates of x i Common patterns 22

Proximal operator � consider x -update when A = I x + = argmin f ( x ) + ( ρ/ 2) � x − v � 2 � � = prox f,ρ ( v ) 2 x � some special cases: x + := Π C ( v ) (projection onto C ) f = I C (indicator fct. of set C ) x + f = λ � · � 1 ( ℓ 1 norm) i := S λ/ρ ( v i ) (soft thresholding) ( S a ( v ) = ( v − a ) + − ( − v − a ) + ) Common patterns 23

Quadratic objective � f ( x ) = (1 / 2) x T Px + q T x + r � x + := ( P + ρA T A ) − 1 ( ρA T v − q ) � use matrix inversion lemma when computationally advantageous ( P + ρA T A ) − 1 = P − 1 − ρP − 1 A T ( I + ρAP − 1 A T ) − 1 AP − 1 � (direct method) cache factorization of P + ρA T A (or I + ρAP − 1 A T ) � (iterative method) warm start, early stopping, reducing tolerances Common patterns 24

Smooth objective � f smooth � can use standard methods for smooth minimization – gradient, Newton, or quasi-Newton – preconditionned CG, limited-memory BFGS (scale to very large problems) � can exploit – warm start – early stopping, with tolerances decreasing as ADMM proceeds Common patterns 25

Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Examples 26

Constrained convex optimization � consider ADMM for generic problem minimize f ( x ) subject to x ∈ C � ADMM form: take g to be indicator of C minimize f ( x ) + g ( z ) x − z = 0 subject to � algorithm: f ( x ) + ( ρ/ 2) � x − z k + u k � 2 x k +1 � � := argmin 2 x Π C ( x k +1 + u k ) z k +1 := u k + x k +1 − z k +1 u k +1 := Examples 27

Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, - PowerPoint PPT Presentation

Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, Trento, 23/6/11 source: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein) 1 Goals

Classes of Herz-Schur multipliers Ivan Todorov April 2014 Toronto Content Positive multipliers

Alternating Direction Method of Multipliers Prof S. Boyd NIPS Workshop on Optimization for

A.C. generates an alternating field Alternating field generates eddy currents in

Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization Feihu

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Alternating Permutations Richard P. Stanley M.I.T. Alternating Permutations p. 1

Alternating Permutations Richard P. Stanley M.I.T. Alternating Permutations p. Basic

Alternating Current Slide 2 / 69 Topics to be covered Sources of alternating EMF Transformers

Alternating Permutations Richard P. Stanley M.I.T. Alternating Permutations p. 1 Basic

Unit 10: Alternating-current circuits Introduction. Alternating current features. Phasor

On the O (1 / k ) Convergence of Asynchronous Distributed Alternating Direction Method of

Decomposable Schur multipliers and non-commutative Fourier multipliers Christoph Kriegler

5 Multipliers Of IMPACT How do you measure IMPACT? The 5 Multipliers of IMPACT Awareness

Littlewood-Paley Theory and Multipliers George Kinnear September 11, 2009 George Kinnear

Extracting INT8 Multipliers from INT18 Multipliers Bogdan Pasca, Martin Langhammer, Gregg

Norms of idempotent Schur multipliers Rupert Levene University College Dublin Banach Algebras

Improving the Minimum Bayes Risk Combination of Machine Translation Systems Jes us Gonz

Generalized Inversion Sequences Carla D. Savage Department of Computer Science North Carolina

Linear Resolution, Chordality and Ascent of Clutters Ashkan Nikseresht

Optimistic Policy Optimization via Multiple Importance Sampling Matteo Papini Alberto Maria

SDCA Stochastic Dual Coordinate Ascent Jingchang Liu June 29, 2017 University of Science and

HVS Planning for FD Integration/Installation Bo Yu, Francesco Pietropaolo 21 August 2019 Outline

1 & 2 Samuel Series Lesson #029 October 13, 2015 Dean Bible Ministries

HV work at Ash River Ash River Crew August 13th, 2019 Status of APA Tower This week work

Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, - PowerPoint PPT Presentation

Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, Trento, 23/6/11 source: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein) 1 Goals

Classes of Herz-Schur multipliers Ivan Todorov April 2014 Toronto Content Positive multipliers

Alternating Direction Method of Multipliers Prof S. Boyd NIPS Workshop on Optimization for

A.C. generates an alternating field Alternating field generates eddy currents in

Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization Feihu

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Alternating Permutations Richard P. Stanley M.I.T. Alternating Permutations p. 1

Alternating Permutations Richard P. Stanley M.I.T. Alternating Permutations p. Basic

Alternating Current Slide 2 / 69 Topics to be covered Sources of alternating EMF Transformers

Alternating Permutations Richard P. Stanley M.I.T. Alternating Permutations p. 1 Basic

Unit 10: Alternating-current circuits Introduction. Alternating current features. Phasor

On the O (1 / k ) Convergence of Asynchronous Distributed Alternating Direction Method of

Decomposable Schur multipliers and non-commutative Fourier multipliers Christoph Kriegler

5 Multipliers Of IMPACT How do you measure IMPACT? The 5 Multipliers of IMPACT Awareness

Littlewood-Paley Theory and Multipliers George Kinnear September 11, 2009 George Kinnear

Extracting INT8 Multipliers from INT18 Multipliers Bogdan Pasca, Martin Langhammer, Gregg

Norms of idempotent Schur multipliers Rupert Levene University College Dublin Banach Algebras

Improving the Minimum Bayes Risk Combination of Machine Translation Systems Jes us Gonz

Generalized Inversion Sequences Carla D. Savage Department of Computer Science North Carolina

Linear Resolution, Chordality and Ascent of Clutters Ashkan Nikseresht

Optimistic Policy Optimization via Multiple Importance Sampling Matteo Papini Alberto Maria

SDCA Stochastic Dual Coordinate Ascent Jingchang Liu June 29, 2017 University of Science and

HVS Planning for FD Integration/Installation Bo Yu, Francesco Pietropaolo 21 August 2019 Outline

1 &amp; 2 Samuel Series Lesson #029 October 13, 2015 Dean Bible Ministries

HV work at Ash River Ash River Crew August 13th, 2019 Status of APA Tower This week work

1 & 2 Samuel Series Lesson #029 October 13, 2015 Dean Bible Ministries