Distributed optimization Mikael Johansson KTH Stockholm - Sweden - PDF document

6/26/13 ¡ Distributed optimization Mikael Johansson KTH – Stockholm - Sweden Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ Aim of these lectures “To present some of the key techniques for distributed optimization in a coherent and comprehensible manner” Focus on understanding, not all the details – each lecture could be a full-semester course – you will have to work with the material yourself! Focus on fundamentals, not fads – many techniques date back to 60’s-80’s, … – but some are very recent, and research frontier is not far away Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ 1 ¡

6/26/13 ¡ Why distributed optimization Optimization on a “Google scale” – information processing on huge data sets Coordination and control of large-scale systems – power and water distribution – vehicle coordination and planning – sensor, social, and data networks Theoretical foundation for communication protocol design – Internet congestion control – scheduling and power control in wireless systems Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ Example: water distribution Coordinated control of water distribution in city of Barcelona (WIDE) Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ 2 ¡

6/26/13 ¡ Example: multi-agent coordination Cooperate to find jointly optimal controls and rendez-vous point minimize P i ∈ V f i ( θ ) subject to θ ∈ Θ where t =0 ( x t − θ ) T Q ( x t − θ ) + u T P T f i ( θ ) = min t Ru t s.t. x t +1 = Ax t + Bu t , t = 0 , . . . , T − 1 Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ Example: communication protocol design Understand how TCP/IP shares network resources between users maximize P i u i ( x i ) subject to P i ∈ P ( l ) x i ≤ c l , l ∈ L Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ 3 ¡

6/26/13 ¡ Lecture overview Lecture 1: first-order methods for convex optimization Lecture 2: multi-agent optimization Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ Part I: Convex optimization using first-order methods Aim: to understand – properties and analysis techniques for basic gradient method – the interplay between problem structure and convergence rate guarantees – how we can deal with non-smoothness, noise and constraints Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ 4 ¡

6/26/13 ¡ Rationale Convex optimization: – minimize convex function subject to convex constraints – local minima global, strong and useful theory First-order methods: – only use function and gradient evaluations (i.e. no Hessians) – easy to analyze, implement and distribute, yet competitive Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ Convex functions and convex sets y f ( y ) f ( x ) x α x + (1 − α ) y ∈ X, α ∈ [0 , 1] α f ( x ) + (1 − α ) f ( y ) ≥ f ( α x + (1 − α ) y ) , α ∈ [0 , 1] Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ 5 ¡

6/26/13 ¡ A ffi ne lower bounds from convexity f ( y ) f ( x ) f ( y ) � f ( x ) + hr f ( x ) , y � x i Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ Strong convexity – quadratic lower bounds f ( y ) f ( x ) f ( y ) � f ( x ) + hr f ( x ) , y � x i + c 2 k y � x k 2 Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ 6 ¡

6/26/13 ¡ Lipschitz continuous gradient – upper bounds Lipschitz-continuous gradient: kr f ( x ) � r f ( y ) k  L k x � y k f ( y ) f ( x ) Yields upper quadratic bound: f ( y )  f ( x ) + hr f ( x ) , y � x i + L 2 k y � x k 2 Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ Strongly convex functions with Lipschitz gradient Bounded from above and below by quadratic functions Condition number impacts performance of first-order methods. κ = L/c Note: limited function class when required to hold globally. Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ 7 ¡

6/26/13 ¡ The basic gradient method Basic gradient method x ( t + 1) = x ( t ) � α ( t ) r f ( x ( t )) A descent method (for small enough step-size ). α ( t ) Convergence proof. k x ( t + 1) � x ? k 2 2 = k x ( t ) � x ? k 2 2 � 2 α ( t ) hr f ( x ( t )) , x ( t ) � x ? i + α ( t ) 2 kr f ( x ( t )) k 2 2  k x ( t ) � x ? k 2 2 � 2 α ( t ) ( f ( x ( t )) � f ? ) + α ( t ) 2 kr f ( x ( t )) k 2 2 Where the inequality follows from convexity of f Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ Gradient method convergence proof Applying recursively, we find T − 1 T − 1 X X k x ( T ) � x ? k 2 2  k x (0) � x ? k 2 α 2 ( t ) kr f ( x ( t )) k 2 2 � 2 α ( t )( f ( x ( t )) � f ? ) + 2 t =0 t =0 Since gradient method is descent, and norms are non-negative T − 1 T − 1 X α ( t )  k x (0) � x ? k 2 X α 2 ( t ) kr f ( x ( t )) k 2 2( f ( x ( T )) � f ? ) 2 + 2 t =0 t =0 Hence, with R 0 = k x (0) � x ? k 0 + P T − 1 f ( x ( T )) � f ? )  R 2 t =0 α 2 ( t ) kr f ( x ( t )) k 2 2 2 P T − 1 t =0 α ( t ) Further assumptions needed to guarantee convergence! Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ 8 ¡

6/26/13 ¡ Gradient method discussion If we assume that f is Lipschitz, i.e. kr f ( x ( t )) k  L f P T − 1 R 2 0 + L 2 t =0 α 2 ( t ) f f ( x ( T )) − f ? ) ≤ 2 P T − 1 t =0 α ( t ) Then, – For fixed step-size α ( t ) = α α L 2 T →∞ f ( x ( T )) ≤ f ? + f lim 2 – For diminishing stepsizes P ∞ t =0 α 2 ( t ) < ∞ , P ∞ t =0 α ( t ) = ∞ T →∞ f ( x ( T )) = f ? lim ( R 0 L f ) 2 / ε 2 – Accuracy can be obtained in steps ε Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ Example Smaller residual error for smaller stepsize, convergence for diminishing Hycon2 ¡PhD ¡School, ¡July ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Mikael ¡Johansson ¡ ¡ ¡ ¡ ¡mikaelj@ee.kth.se ¡ ¡ ¡ 9 ¡

Distributed optimization Mikael Johansson KTH Stockholm - Sweden - PDF document

6/26/13 Distributed optimization Mikael Johansson KTH Stockholm - Sweden Hycon2 PhD School, July 2013

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Distributed Databases Distributed database management system A distributed database (DDB) is

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Multiplicative Weights Update as a Distributed Optimization Algorithm: Constrained Optimization

Research Interests Distributed algorithms Distributed shared memory systems Distributed

The Arvy Distributed Directory Protocol Pankaj Khanchandani, Roger Wattenhofer ETH Zurich -

` James R. Wilcox Zach Tatlock Ilya Sergey Distributed Systems Distributed Infrastructure

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Time in Distributed Systems, Distributed Simulation, and Distributed Debugging Friedemann

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

Flat and nested distributed Outline transactions Flat and nested distributed transactions

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Stephen Foreshew-Cain @s_foreshew_cain Hello Copenhagen @s_foreshew_cain @s_foreshew_cain

Welcome AnintroductiontoYear6. 1 MeettheTeam 6M

Steve Cains Head of Power Solutions PPS Technologies Bridge Model Private sector Public

Green Homes Grant COMMUNITY CLIMATE ACTION NETWORK MARCHES ENERGY AGENCY

Formation of planetesimals in collapsing particle clouds Karl Wahlberg Jansson Supervisor:

AXDA : efficient sampling through variable splitting inspired bayesian hierarchical models P.

Parameter Estimation for Quantum Information Christopher Granade www.cgranade.com

(1) Otto-von-Guericke-Universitt Magdeburg (2) TUD Technische Universitt Darmstadt FOSD

Distributed optimization Mikael Johansson KTH Stockholm - Sweden - PDF document

6/26/13 Distributed optimization Mikael Johansson KTH Stockholm - Sweden Hycon2 PhD School, July 2013

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Distributed Databases Distributed database management system A distributed database (DDB) is

Distributed File Systems Distributed File Systems A distributed file system (DFS) is a

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

Multiplicative Weights Update as a Distributed Optimization Algorithm: Constrained Optimization

Research Interests Distributed algorithms Distributed shared memory systems Distributed

The Arvy Distributed Directory Protocol Pankaj Khanchandani, Roger Wattenhofer ETH Zurich -

` James R. Wilcox Zach Tatlock Ilya Sergey Distributed Systems Distributed Infrastructure

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals &amp; Challenges

Time in Distributed Systems, Distributed Simulation, and Distributed Debugging Friedemann

Distributed File Systems Issues in Distributed File Service Case Studies: Sun

Flat and nested distributed Outline transactions Flat and nested distributed transactions

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals &amp; Challenges

Stephen Foreshew-Cain @s_foreshew_cain Hello Copenhagen @s_foreshew_cain @s_foreshew_cain

Welcome AnintroductiontoYear6. 1 MeettheTeam 6M

Steve Cains Head of Power Solutions PPS Technologies Bridge Model Private sector Public

Green Homes Grant COMMUNITY CLIMATE ACTION NETWORK MARCHES ENERGY AGENCY

Formation of planetesimals in collapsing particle clouds Karl Wahlberg Jansson Supervisor:

AXDA : efficient sampling through variable splitting inspired bayesian hierarchical models P.

Parameter Estimation for Quantum Information Christopher Granade www.cgranade.com

(1) Otto-von-Guericke-Universitt Magdeburg (2) TUD Technische Universitt Darmstadt FOSD

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges

Distributed Systems Goals of Distributed Systems 13A. Distributed Systems: Goals & Challenges