ALADIN—An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of Magdeburg, University of Freiburg 1
Motivation: sensor network localization Decoupled case: each sensor takes measurement η i of its position χ i and solves χ i � χ i − η i � 2 ∀ i ∈ { 1 , . . . , 7 } , min 2 . 2
Motivation: sensor network localization Coupled case: sensors additionally measure the distance to their neighbors 7 � η i ) 2 � � � χ i − η i � 2 min 2 + ( � χ i − χ i +1 � 2 − ¯ with χ 8 = χ 1 χ i =1 3
Motivation: sensor network localization Equivalent formulation: set x 1 = ( χ 1 , ζ 1 ) with ζ 1 = χ 2 , set x 2 = ( χ 2 , ζ 2 ) with ζ 2 = χ 3 , and so on 4
Motivation: sensor network localization Equivalent formulation: set x 1 = ( χ 1 , ζ 1 ) with ζ 1 = χ 2 , set x 2 = ( χ 2 , ζ 2 ) with ζ 2 = χ 3 , and so on 5
Motivation: sensor network localization Equivalent formulation: set x 1 = ( χ 1 , ζ 1 ) with ζ 1 = χ 2 , set x 2 = ( χ 2 , ζ 2 ) with ζ 2 = χ 3 , and so on 6
Motivation: sensor network localization Equivalent formulation (cont.): new variables x i = ( χ i , ζ i ) separable non-convex objectives f i ( x i ) = 1 2 + 1 2 + 1 η i ) 2 2 � χ i − η i � 2 2 � ζ i − η i +1 � 2 2 ( � χ i − ζ i � 2 − ¯ affine coupling, ζ i = χ i +1 , can be written as 7 � A i x i = 0 . i =1 7
Motivation: sensor network localization Equivalent formulation (cont.): new variables x i = ( χ i , ζ i ) separable non-convex objectives f i ( x i ) = 1 2 + 1 2 + 1 η i ) 2 2 � χ i − η i � 2 2 � ζ i − η i +1 � 2 2 ( � χ i − ζ i � 2 − ¯ affine coupling, ζ i = χ i +1 , can be written as 7 � A i x i = 0 . i =1 8
Motivation: sensor network localization Equivalent formulation (cont.): new variables x i = ( χ i , ζ i ) separable non-convex objectives f i ( x i ) = 1 2 + 1 2 + 1 η i ) 2 2 � χ i − η i � 2 2 � ζ i − η i +1 � 2 2 ( � χ i − ζ i � 2 − ¯ affine coupling, ζ i = χ i +1 , can be written as 7 � A i x i = 0 . i =1 9
Motivation: sensor network localization Optimization problem: 7 7 � � min f i ( x i ) s . t . A i x i = 0 . x i =1 i =1 10
Aim of distributed optimization algorithms Find local minimizers of N N � � min f i ( x i ) s . t . A i x i = b x i =1 i =1 Functions f i : R n → R potentially non-convex. Matrices A i ∈ R m × n and vectors b ∈ R m given. Problem: N is large. 11
Aim of distributed optimization algorithms Find local minimizers of N N � � min f i ( x i ) s . t . A i x i = b x i =1 i =1 Functions f i : R n → R potentially non-convex. Matrices A i ∈ R m × n and vectors b ∈ R m given. Problem: N is large. 12
Aim of distributed optimization algorithms Find local minimizers of N N � � min f i ( x i ) s . t . A i x i = b x i =1 i =1 Functions f i : R n → R potentially non-convex. Matrices A i ∈ R m × n and vectors b ∈ R m given. Problem: N is large. 13
Aim of distributed optimization algorithms Find local minimizers of N N � � min f i ( x i ) s . t . A i x i = b x i =1 i =1 Functions f i : R n → R potentially non-convex. Matrices A i ∈ R m × n and vectors b ∈ R m given. Problem: N is large. 14
Overview • Theory - Distributed optimization algorithms - ALADIN • Applications - Sensor network localization - MPC with long horizons 15
Distributed optimization problem Find local minimizers of N N � � min f i ( x i ) s . t . A i x i = b . x i =1 i =1 Functions f i : R n → R potentially non-convex. Matrices A i ∈ R m × n and vectors b ∈ R m given. Problem: N is large. 16
Dual decomposition Main idea: solve dual problem N � { f i ( x i ) + λ T A i x i } − λ T b max d ( λ ) with d ( λ ) = min x λ i =1 Evaluation of d can be parallelized. Applicable if f i s are (strictly) convex For non-convex f i : duality gap possible H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963. 17
Dual decomposition Main idea: solve dual problem N � x i { f i ( x i ) + λ T A i x i } − λ T b max d ( λ ) with d ( λ ) = min λ i =1 Evaluation of d can be parallelized. Applicable if f i s are (strictly) convex For non-convex f i : duality gap possible H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963. 18
Dual decomposition Main idea: solve dual problem N � x i { f i ( x i ) + λ T A i x i } − λ T b max d ( λ ) with d ( λ ) = min λ i =1 Evaluation of d can be parallelized. Applicable if f i s are (strictly) convex For non-convex f i : duality gap possible H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963. 19
Dual decomposition Main idea: solve dual problem N � x i { f i ( x i ) + λ T A i x i } − λ T b max d ( λ ) with d ( λ ) = min λ i =1 Evaluation of d can be parallelized. Applicable if f i s are (strictly) convex For non-convex f i : duality gap possible H. Everett. Generalized Lagrange multiplier method for solving problems of optimum allocation of resources, 1963. 20
ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i ∈ R n and λ i ∈ R m ; ρ > 0 , ǫ > 0 . Repeat: 1. Solve decoupled NLPs i A i y i + ρ 2 � A i ( y i − x i ) � 2 f i ( y i ) + λ T min 2 . y i 2. Implement dual gradient steps λ + i = λ i + ρ A i ( y i − x i ) . 3. Solve coupled QP N N � ρ � 2 � � � � � A i ( y i − x + � 2 − ( λ + i ) T A i x + A i x + min i ) s . t . = b . i i 2 x + i =1 i =1 4. Update the iterates x ← x + and λ ← λ + . D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 21
ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i ∈ R n and λ i ∈ R m ; ρ > 0 , ǫ > 0 . Repeat: 1. Solve decoupled NLPs i A i y i + ρ 2 � A i ( y i − x i ) � 2 f i ( y i ) + λ T min 2 . y i 2. Implement dual gradient steps λ + i = λ i + ρ A i ( y i − x i ) . 3. Solve coupled QP N N � ρ � 2 � � � � � A i ( y i − x + � 2 − ( λ + i ) T A i x + A i x + min i ) s . t . = b . i i 2 x + i =1 i =1 4. Update the iterates x ← x + and λ ← λ + . D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 22
ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i ∈ R n and λ i ∈ R m ; ρ > 0 , ǫ > 0 . Repeat: 1. Solve decoupled NLPs i A i y i + ρ 2 � A i ( y i − x i ) � 2 f i ( y i ) + λ T min 2 . y i 2. Implement dual gradient steps λ + i = λ i + ρ A i ( y i − x i ) . 3. Solve coupled QP N N � ρ � 2 � � � � � A i ( y i − x + � 2 − ( λ + i ) T A i x + A i x + min i ) s . t . = b . i i 2 x + i =1 i =1 4. Update the iterates x ← x + and λ ← λ + . D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 23
ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i ∈ R n and λ i ∈ R m ; ρ > 0 , ǫ > 0 . Repeat: 1. Solve decoupled NLPs i A i y i + ρ 2 � A i ( y i − x i ) � 2 f i ( y i ) + λ T min 2 . y i 2. Implement dual gradient steps λ + i = λ i + ρ A i ( y i − x i ) . 3. Solve coupled QP N N � ρ � 2 � � � � � A i ( y i − x + � 2 − ( λ + i ) T A i x + A i x + min i ) s . t . = b . i i 2 x + i =1 i =1 4. Update the iterates x ← x + and λ ← λ + . D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 24
ADMM (consensus variant) Alternating Direction Method of Multipliers Input: Initial guesses x i ∈ R n and λ i ∈ R m ; ρ > 0 , ǫ > 0 . Repeat: 1. Solve decoupled NLPs i A i y i + ρ 2 � A i ( y i − x i ) � 2 f i ( y i ) + λ T min 2 . y i 2. Implement dual gradient steps λ + i = λ i + ρ A i ( y i − x i ) . 3. Solve coupled QP N N � ρ � 2 � � � � � A i ( y i − x + � 2 − ( λ + i ) T A i x + A i x + min i ) s . t . = b . i i 2 x + i =1 i =1 4. Update the iterates x ← x + and λ ← λ + . D. Gabay, B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximations, 1976. 25
Limitations of ADMM 1) Convergence rate of ADMM is very scaling dependent. 2) ADMM may be divergent, if f i s are nonconvex. Example: min x 1 · x 2 s . t . x 1 − x 2 = 0 . x 2 = λ ∗ = 0 . unique and regular minimizer at x ∗ 1 = x ∗ For ρ = 3 4 all sub-problems are strictly convex. ADMM is divergent; λ + = − 2 λ . This talk: addresses Problem 2), mitigates Problem 1) 26
Recommend
More recommend