multi agent constrained optimization of a strongly convex
play

Multi-agent constrained optimization of a strongly convex function - PowerPoint PPT Presentation

Multi-agent constrained optimization of a strongly convex function Necdet Serhat Aybat Industrial & Manufacturing Engineering Penn State University Joint work with Erfan Yazdandoost Hamedani Research supported by NSF CMMI-1635106 and ARO


  1. Multi-agent constrained optimization of a strongly convex function Necdet Serhat Aybat Industrial & Manufacturing Engineering Penn State University Joint work with Erfan Yazdandoost Hamedani Research supported by NSF CMMI-1635106 and ARO W911NF-17-1-0298

  2. Motivation Many real-life networks are large-scale composed of agents with local information agents willing to collaborate without sharing their private data This motivated huge interest in designing decentralized methods for optimization of multi-agent systems 2

  3. Motivation Many real-life networks are large-scale composed of agents with local information agents willing to collaborate without sharing their private data This motivated huge interest in designing decentralized methods for optimization of multi-agent systems Examples: Routing and congestion control in wired and wireless networks parameter estimation in sensor networks multi-agent cooperative control and coordination processing distributed big-data in (online) machine learning 2

  4. Decentralized Consensus Optimization Compute an optimal solution for � � ϕ ( x ) � ( P ) : min ¯ ϕ i ( x ) s.t. x ∈ X i x i ∈N i ∈N 3

  5. Decentralized Consensus Optimization Compute an optimal solution for � � ϕ ( x ) � ( P ) : min ¯ ϕ i ( x ) s.t. x ∈ X i x i ∈N i ∈N N = { 1 , . . . , N } processing nodes 3

  6. Decentralized Consensus Optimization ϕ 4 ( x ) ϕ 3 ( x ) Compute an optimal solution for � � ϕ 5 ( x ) ϕ ( x ) � ( P ) : min ¯ ϕ i ( x ) s.t. x ∈ X i x i ∈N i ∈N ϕ 1 ( x ) ϕ 2 ( x ) N = { 1 , . . . , N } processing nodes on a time-varying G t = ( N , E t ) 3

  7. Decentralized Consensus Optimization ϕ 4 ( x ) ϕ 3 ( x ) Compute an optimal solution for � � ϕ 5 ( x ) ϕ ( x ) � ( P ) : min ¯ ϕ i ( x ) s.t. x ∈ X i x i ∈N i ∈N ϕ 1 ( x ) ϕ 2 ( x ) N = { 1 , . . . , N } processing nodes on a time-varying G t = ( N , E t ) Node i can transmit data to j at time t only if ( i, j ) ∈ E t 3

  8. Decentralized Consensus Optimization ϕ 4 ( x ) ϕ 3 ( x ) Compute an optimal solution for � � ϕ 5 ( x ) ϕ ( x ) � ( P ) : min ¯ ϕ i ( x ) s.t. x ∈ X i x i ∈N i ∈N ϕ 1 ( x ) ϕ 2 ( x ) N = { 1 , . . . , N } processing nodes on a time-varying G t = ( N , E t ) Node i can transmit data to j at time t only if ( i, j ) ∈ E t ϕ ( x ) : strongly convex ( ¯ ¯ µ > 0 ) ϕ i ( x ) � ρ i ( x ) + f i ( x ) locally known ( µ � min i ∈N µ i ≥ 0 ) ¯ X i � { x : G i ( x ) ∈ −K i } locally known, K i closed convex cone. 3

  9. Decentralized Consensus Optimization ϕ 4 ( x ) ϕ 3 ( x ) Compute an optimal solution for � � ϕ 5 ( x ) ϕ ( x ) � ( P ) : min ¯ ϕ i ( x ) s.t. x ∈ X i x i ∈N i ∈N ϕ 1 ( x ) ϕ 2 ( x ) N = { 1 , . . . , N } processing nodes on a time-varying G t = ( N , E t ) Node i can transmit data to j at time t only if ( i, j ) ∈ E t ϕ ( x ) : strongly convex ( ¯ ¯ µ > 0 ) ϕ i ( x ) � ρ i ( x ) + f i ( x ) locally known ( µ � min i ∈N µ i ≥ 0 ) ¯ f i : convex + Lipschitz continuous gradient (constant L i ) ρ i : convex (possibly non-smooth) + efficient prox-map � � ρ i ( ξ ) + 1 prox ρ i ( x ) � argmin ξ ∈ R n 2 � ξ − x � 2 2 X i � { x : G i ( x ) ∈ −K i } locally known, K i closed convex cone. 3

  10. Decentralized Consensus Optimization ϕ 4 ( x ) ϕ 3 ( x ) Compute an optimal solution for � � ϕ 5 ( x ) ϕ ( x ) � ( P ) : min ¯ ϕ i ( x ) s.t. x ∈ X i x i ∈N i ∈N ϕ 1 ( x ) ϕ 2 ( x ) N = { 1 , . . . , N } processing nodes on a time-varying G t = ( N , E t ) Node i can transmit data to j at time t only if ( i, j ) ∈ E t ϕ ( x ) : strongly convex ( ¯ ¯ µ > 0 ) ϕ i ( x ) � ρ i ( x ) + f i ( x ) locally known ( µ � min i ∈N µ i ≥ 0 ) ¯ f i : convex + Lipschitz continuous gradient (constant L i ) ρ i : convex (possibly non-smooth) + efficient prox-map � � ρ i ( ξ ) + 1 prox ρ i ( x ) � argmin ξ ∈ R n 2 � ξ − x � 2 2 X i � { x : G i ( x ) ∈ −K i } locally known, K i closed convex cone. G i : K i -convex + Lip. cont. ( C G ) + Lip. cont. Jacobian ∇ G i ( L G ) 3

  11. Examples Constrained Lasso K = − R p x ∈ R n { λ � x � 1 + � Cx − d � 2 min 2 : Ax ≤ b } , + distributed data: C i ∈ R m i × n and d i ∈ R m i for i ∈ N m = � C = [ C i ] i ∈N ∈ R m × n , d = [ d i ] i ∈N ∈ R m , i ∈N m i |N | � x � 1 + � C i x − d i � 2 λ ϕ i ( x ) = 2 merely convex ( m i < n ) ϕ ( x ) = � ¯ i ∈N ϕ i ( x ) strongly convex when rank ( C ) = n ( m ≥ n ) 4

  12. Examples Constrained Lasso K = − R p x ∈ R n { λ � x � 1 + � Cx − d � 2 min 2 : Ax ≤ b } , + distributed data: C i ∈ R m i × n and d i ∈ R m i for i ∈ N m = � C = [ C i ] i ∈N ∈ R m × n , d = [ d i ] i ∈N ∈ R m , i ∈N m i |N | � x � 1 + � C i x − d i � 2 λ ϕ i ( x ) = 2 merely convex ( m i < n ) ϕ ( x ) = � ¯ i ∈N ϕ i ( x ) strongly convex when rank ( C ) = n ( m ≥ n ) Closest point in the intersection � x � 2 min � x − ¯ s.t. G i ( x ) ∈ −K i , i ∈ N . 2 x ∈∩ i ∈N X i i ∈N 4

  13. Related Work - Constrained Chang, Nedich, Scaglione’14: primal-dual method – min x ∈∩ i ∈N X i F ( � i ∈N f i ( x )) s.t. � i ∈N g i ( x ) ≤ 0 – F and f i smooth, X i compact, and time-varying directed G – no rate result, can handle non-smooth constraints 5

  14. Related Work - Constrained Chang, Nedich, Scaglione’14: primal-dual method – min x ∈∩ i ∈N X i F ( � i ∈N f i ( x )) s.t. � i ∈N g i ( x ) ≤ 0 – F and f i smooth, X i compact, and time-varying directed G – no rate result, can handle non-smooth constraints Núñez, Cortés’15: min � i ∈N ϕ i ( ξ i , x ) s.t. � i ∈N g i ( ξ i , x ) ≤ 0 – ϕ i and g i convex; time-varying directed G √ k ) rate for L (¯ y k ) − L ( ξ ∗ , x ∗ , y ∗ ) ξ k , ¯ x k , ¯ – O (1 / – no error bounds on infeasibility, and suboptimality 5

  15. Related Work - Constrained Aybat, Yazdandoost Hamedani’16: primal-dual method – min � i ∈N ϕ i ( x ) s.t. A i x − b i ∈ K i , i ∈ N – time-varying undirected and directed G – O (1 /k ) ergodic rate for infeasibility, and suboptimality – Convergence of the primal-dual iterate sequence (without rate) 6

  16. Related Work - Constrained Aybat, Yazdandoost Hamedani’16: primal-dual method – min � i ∈N ϕ i ( x ) s.t. A i x − b i ∈ K i , i ∈ N – time-varying undirected and directed G – O (1 /k ) ergodic rate for infeasibility, and suboptimality – Convergence of the primal-dual iterate sequence (without rate) Chang’16: (primal-dual method) � i ∈N ρ i ( x i ) + f i ( C i x i ) s.t. � – min x i ∈X i i ∈N A i x i = b – f i smooth and strongly convex; time-varying undirected G – O (1 /k ) ergodic rate 6

  17. Notation � . � : Euclidean norm σ S ( . ) : Support function of set S , σ S ( θ ) � sup w ∈S � θ, w � 7

  18. Notation � . � : Euclidean norm σ S ( . ) : Support function of set S , σ S ( θ ) � sup w ∈S � θ, w � P S ( w ) � argmin {� v − w � : v ∈ S} : Projection onto S d S ( w ) � �P S ( w ) − w � : Distance function K ∗ : Dual cone of K , K ◦ : Polar cone of K , K ◦ � { θ ∈ R m : � θ, w � ≤ 0 ∀ w ∈ K} , 7

  19. Notation � . � : Euclidean norm σ S ( . ) : Support function of set S , σ S ( θ ) � sup w ∈S � θ, w � P S ( w ) � argmin {� v − w � : v ∈ S} : Projection onto S d S ( w ) � �P S ( w ) − w � : Distance function K ∗ : Dual cone of K , K ◦ : Polar cone of K , K ◦ � { θ ∈ R m : � θ, w � ≤ 0 ∀ w ∈ K} , ⊗ : Kronecker product Π : Cartesian product 7

  20. Preliminaries: Primal-dual Algorithm (PDA) PDA for convex-concave saddle-point problem by Chambolle and Pock’16 y ∈Y L ( x , y ) � Φ( x ) + � T x , y � − h ( y ) , min x ∈X max • Φ( x ) � ρ ( x ) + f ( x ) strongly convex ( µ > 0 ), h convex, T linear map 8

  21. Preliminaries: Primal-dual Algorithm (PDA) PDA for convex-concave saddle-point problem by Chambolle and Pock’16 y ∈Y L ( x , y ) � Φ( x ) + � T x , y � − h ( y ) , min x ∈X max • Φ( x ) � ρ ( x ) + f ( x ) strongly convex ( µ > 0 ), h convex, T linear map PDA : h ( y ) − � T � x k + η k ( x k − x k − 1 ) � , y � y k +1 ← argmin + D k ( y , y k ) , y ρ ( x ) + f ( x k ) + � ∇ f ( x k ) , x − x k � + � T x , y k +1 � 1 x k +1 ← argmin 2 τ k � x − x k � , + x 1 y � 2 • D k is Bregman distance function D k ( y , ¯ y ) ≥ 2 κ k � y − ¯ 8

Recommend


More recommend