Asynchronous Algorithms for Conic Programs, including Optimal, - PowerPoint PPT Presentation

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization, Information Processing, and Learning August’17 1 / 31

Overview • conic programming problem (P): minimize c T x subject to Ax = b, x ∈ K K is a closed convex cone • this talk : a first-order iteration • parallel: linear speedup, async • still working if problem is unsolvable 2 / 31

Approach overview Douglas-Rachfordf 1 fixed point iteration z k +1 = Tz k T depends on A, b, c and has nice properties: 1 equivalent to standard ADMM, but the different form is important 3 / 31

Approach overview Douglas-Rachfordf 1 fixed point iteration z k +1 = Tz k T depends on A, b, c and has nice properties: • convergence guarantees and rates 1 equivalent to standard ADMM, but the different form is important 3 / 31

Approach overview Douglas-Rachfordf 1 fixed point iteration z k +1 = Tz k T depends on A, b, c and has nice properties: • convergence guarantees and rates • coordinate friendly: break z into m blocks, cost( T i ) ∼ 1 m cost( T ) 1 equivalent to standard ADMM, but the different form is important 3 / 31

Approach overview Douglas-Rachfordf 1 fixed point iteration z k +1 = Tz k T depends on A, b, c and has nice properties: • convergence guarantees and rates • coordinate friendly: break z into m blocks, cost( T i ) ∼ 1 m cost( T ) • divergent nicely: • (P) has no primal-dual sol pair ⇔ � z k � → ∞ • z k +1 − z k tells a whole lot 1 equivalent to standard ADMM, but the different form is important 3 / 31

Douglas-Rachford splitting (Lions-Mercier’79) • proximal mapping of a closed function h 2 γ � z − x � 2 } 1 prox γh ( x ) = arg min { h ( z ) + z 4 / 31

Douglas-Rachford splitting (Lions-Mercier’79) • proximal mapping of a closed function h 2 γ � z − x � 2 } 1 prox γh ( x ) = arg min { h ( z ) + z • Douglas-Rachford Splitting (DRS) method solves minimize f ( x ) + g ( x ) by iterating z k +1 = Tz k 4 / 31

Douglas-Rachford splitting (Lions-Mercier’79) • proximal mapping of a closed function h 2 γ � z − x � 2 } 1 prox γh ( x ) = arg min { h ( z ) + z • Douglas-Rachford Splitting (DRS) method solves minimize f ( x ) + g ( x ) by iterating z k +1 = Tz k defined as : x k + 1 2 = prox γg ( z k ) x k +1 = prox γf (2 z k − x k + 1 2 ) z k +1 = z k + ( x k +1 − x k + 1 2 ) 4 / 31

Apply DRS to conic programming minimize c T x subject to Ax = b, x ∈ K ⇔ minimize � c T x + δ A · = b ( x ) � + δ K ( x ) � �� g ( x ) f ( x ) • cone K is nonempty closed convex 5 / 31

Apply DRS to conic programming minimize c T x subject to Ax = b, x ∈ K ⇔ minimize � c T x + δ A · = b ( x ) � + δ K ( x ) � �� g ( x ) f ( x ) • cone K is nonempty closed convex • each iteration: project onto K , then project onto A · = b 5 / 31

Apply DRS to conic programming minimize c T x subject to Ax = b, x ∈ K ⇔ minimize � c T x + δ A · = b ( x ) � + δ K ( x ) � �� g ( x ) f ( x ) • cone K is nonempty closed convex • each iteration: project onto K , then project onto A · = b • per-iteration cost: O ( n 2 ) if x ∈ R n (by pre-factorizing AA T ) 5 / 31

Apply DRS to conic programming minimize c T x subject to Ax = b, x ∈ K ⇔ minimize � c T x + δ A · = b ( x ) � + δ K ( x ) � �� g ( x ) f ( x ) • cone K is nonempty closed convex • each iteration: project onto K , then project onto A · = b • per-iteration cost: O ( n 2 ) if x ∈ R n (by pre-factorizing AA T ) • prior work: ADMM for SDP (Wen-Goldfarb-Y.’09) 5 / 31

Other choices of splitting • linearized ADMM and primal-dual splitting: avoid inverting full A • variations of Frank-Wolfe: avoid expensive projections to SDP cone • subgradient and bundle methods ... 6 / 31

Coordinate friendly 2 (CF) • (Block) coordinate update is fast only if the subproblems are simple • definition : T : H → H is CF if, for any z and i ∈ [ m ] , z + := � � z 1 , . . . , ( Tz ) i , . . . , z m it holds that = O � 1 cost � { z, M ( z ) } �→ { z + , M ( z + ) } � m cost[ z �→ Tz ] � where M ( z ) is some quantity maintained in the memory 2 Peng-Wu-Xu-Yan-Y. AMSA’16 7 / 31

Composed operators • 9 rules 3 for CF T 1 ◦ T 2 cover many examples • general principles: • T 1 ◦ T 2 inherits the (weaker) separability property • if T 1 is CF and T 2 to be either cheap , easy-to-maintain , or directly CF , then T 1 ◦ T 2 is CF • if T 1 is separable or cheap, T 1 ◦ T 2 is easier to CF 3 Peng-Wu-Xu-Yan-Y. AMSA’16 8 / 31

Lists of CF T 1 ◦ T 2 • many convex image processing models • portfolio optimization • most sparse optimization problems • all LPs, all SOCPs, and SDPs without large cones • most ERM problems • ... 9 / 31

Example: DRS for SOCP • second-order cone: Q n = { x ∈ R n : x 1 ≥ � ( x 2 , . . . , x n ) � 2 } 10 / 31

Example: DRS for SOCP • second-order cone: Q n = { x ∈ R n : x 1 ≥ � ( x 2 , . . . , x n ) � 2 } • DRS operator has the form T = linear ◦ proj Q n 1 ×···× Q np 10 / 31

Example: DRS for SOCP • second-order cone: Q n = { x ∈ R n : x 1 ≥ � ( x 2 , . . . , x n ) � 2 } • DRS operator has the form T = linear ◦ proj Q n 1 ×···× Q np • CF is trivial if all cones are small 10 / 31

Example: DRS for SOCP • second-order cone: Q n = { x ∈ R n : x 1 ≥ � ( x 2 , . . . , x n ) � 2 } • DRS operator has the form T = linear ◦ proj Q n 1 ×···× Q np • CF is trivial if all cones are small • now, consider a big cone; property: proj Q n ( x ) = ( αx 1 , βx 2 , . . . , βx n ) where α, β depend on x 1 and γ := � ( x 2 , . . . , x n ) � 2 10 / 31

Example: DRS for SOCP • second-order cone: Q n = { x ∈ R n : x 1 ≥ � ( x 2 , . . . , x n ) � 2 } • DRS operator has the form T = linear ◦ proj Q n 1 ×···× Q np • CF is trivial if all cones are small • now, consider a big cone; property: proj Q n ( x ) = ( αx 1 , βx 2 , . . . , βx n ) where α, β depend on x 1 and γ := � ( x 2 , . . . , x n ) � 2 • given γ and updating x i , refreshing γ costs O (1) 10 / 31

Example: DRS for SOCP • second-order cone: Q n = { x ∈ R n : x 1 ≥ � ( x 2 , . . . , x n ) � 2 } • DRS operator has the form T = linear ◦ proj Q n 1 ×···× Q np • CF is trivial if all cones are small • now, consider a big cone; property: proj Q n ( x ) = ( αx 1 , βx 2 , . . . , βx n ) where α, β depend on x 1 and γ := � ( x 2 , . . . , x n ) � 2 • given γ and updating x i , refreshing γ costs O (1) • by maintaining γ , proj Q n is cheap, and T = linear ◦ cheap is CF 10 / 31

Fixed-point iterations • full update z k +1 = Tz k 11 / 31

Fixed-point iterations • full update z k +1 = Tz k • (block) coordinate update (CU) : choose i k ∈ [ m ] , � z k i + η (( Tz k ) i − z k i ) , if i = i k z k +1 = i z k i , otherwise . 11 / 31

Fixed-point iterations • full update z k +1 = Tz k • (block) coordinate update (CU) : choose i k ∈ [ m ] , � z k i + η (( Tz k ) i − z k i ) , if i = i k z k +1 = i z k i , otherwise . • parallel CU : p agents choose I k ⊂ [ m ] � z k i + η (( Tz k ) i − z k i ) , if i ∈ I k z k +1 = i z k i , otherwise . • η depends on properties of T , i k , and I k 11 / 31

Sync-parallel versus async-parallel Agent 1 Agent 1 idle idle Agent 2 Agent 2 idle Agent 3 Agent 3 idle Synchronous Asynchronous (faster agents must wait) (all agents are non-stop) 12 / 31

ARock: async-parallel CU • p agents • every agent continuously does: pick i k ⊂ [ m ] , � i + η (( Tz k − d k ) i − z k − d k z k ) , if i = i k z k +1 i = i z k i , otherwise . new notation: • k increases after any agent completes an update k − d k, 1 k − d k,m • z k − d k = ( z ) may be stale , . . . , z m 1 • allow inconsistent atomic read/write 13 / 31

Various theories and meanings • 1969 – 90s: T is contractive in � · � w, ∞ , partially/totally async 14 / 31

Various theories and meanings • 1969 – 90s: T is contractive in � · � w, ∞ , partially/totally async • recent in ML community: async SG and async BCD • early works: random i k , bounded delays, E f has sufficient descent, treat delays as noise, delays independent of i k 14 / 31

Various theories and meanings • 1969 – 90s: T is contractive in � · � w, ∞ , partially/totally async • recent in ML community: async SG and async BCD • early works: random i k , bounded delays, E f has sufficient descent, treat delays as noise, delays independent of i k • state-of-the-art: allow essential cyclic i k , unbounded noise ( t − 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays can depend on i k 14 / 31

Asynchronous Algorithms for Conic Programs, including Optimal, - PowerPoint PPT Presentation

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization, Information Processing, and Learning

MAT 129 Precalculus Chapter 11 Notes Conics Analytic Geometry, Conic Sections David J.

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

Investor Presentation Conic Metals Corp. April 2020 0 www.conicmetals.com | TSXV: NKL Conic

Convex sets, conic matrix factorizations and conic rank lower bounds Pablo A. Parrilo Laboratory

Class 38: Geometry of Conic Sections Class 38: Geometry of Conic Sections Orbital equation

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Conic Programming in GAMS Armin Pruessner, Michael Bussieck, Steven Dirkse, Alex Meeraus GAMS

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

Just like the ellipse and the parabola, a hyperbola is formed when a plane intersects a double

The Feasibility Pump heuristic for Mixed-Integer Conic Programming Workshop on Discrepancy Theory

JUST THE MATHS SLIDES NUMBER 5.6 GEOMETRY 6 (Conic sections - the parabola) by

JUST THE MATHS SLIDES NUMBER 5.7 GEOMETRY 7 (Conic sections - the ellipse by A.J.Hobson

The Mixed-integer Conic Optimizer in MOSEK 23rd International Symposium on Mathematical

Renormalization and Euler-Maclaurin Formula on Cones Li GUO (joint work with Sylvie Paycha and

Mixed-integer conic optimization and MOSEK Dagstuhl seminar on MINLP, February 20th 2018 Sven

Twist products of bimonoids Nick Galatos and Adam Penosil University of Denver, Denver, CO

Grasping Jane Li Assistant Professor Mechanical Engineering & Robotics Engineering

Convex Optimization 2. Convex Sets Prof. Ying Cui Department of Electrical Engineering Shanghai

Visual System I I. Eye, color space, adaptation II. Receptive fields and lateral inhibition

A topological method for the detection of normally hyperbolic invariant manifolds Maciej Capi

Tangent cones of K ahler-Einstein metrics Hans-Joachim Hein UMd College Park & Fordham

Asynchronous Algorithms for Conic Programs, including Optimal, - PowerPoint PPT Presentation

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization, Information Processing, and Learning

MAT 129 Precalculus Chapter 11 Notes Conics Analytic Geometry, Conic Sections David J.

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

Investor Presentation Conic Metals Corp. April 2020 0 www.conicmetals.com | TSXV: NKL Conic

Convex sets, conic matrix factorizations and conic rank lower bounds Pablo A. Parrilo Laboratory

Class 38: Geometry of Conic Sections Class 38: Geometry of Conic Sections Orbital equation

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Conic Programming in GAMS Armin Pruessner, Michael Bussieck, Steven Dirkse, Alex Meeraus GAMS

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

Just like the ellipse and the parabola, a hyperbola is formed when a plane intersects a double

The Feasibility Pump heuristic for Mixed-Integer Conic Programming Workshop on Discrepancy Theory

JUST THE MATHS SLIDES NUMBER 5.6 GEOMETRY 6 (Conic sections - the parabola) by

JUST THE MATHS SLIDES NUMBER 5.7 GEOMETRY 7 (Conic sections - the ellipse by A.J.Hobson

The Mixed-integer Conic Optimizer in MOSEK 23rd International Symposium on Mathematical

Renormalization and Euler-Maclaurin Formula on Cones Li GUO (joint work with Sylvie Paycha and

Mixed-integer conic optimization and MOSEK Dagstuhl seminar on MINLP, February 20th 2018 Sven

Twist products of bimonoids Nick Galatos and Adam Penosil University of Denver, Denver, CO

Grasping Jane Li Assistant Professor Mechanical Engineering &amp; Robotics Engineering

Convex Optimization 2. Convex Sets Prof. Ying Cui Department of Electrical Engineering Shanghai

Visual System I I. Eye, color space, adaptation II. Receptive fields and lateral inhibition

A topological method for the detection of normally hyperbolic invariant manifolds Maciej Capi

Tangent cones of K ahler-Einstein metrics Hans-Joachim Hein UMd College Park &amp; Fordham

Grasping Jane Li Assistant Professor Mechanical Engineering & Robotics Engineering

Tangent cones of K ahler-Einstein metrics Hans-Joachim Hein UMd College Park & Fordham