extending mosek with exponential cones
play

Extending MOSEK with exponential cones ISMP Bordeaux 2018 - PowerPoint PPT Presentation

Extending MOSEK with exponential cones ISMP Bordeaux 2018 joachim.dahl@mosek.com www.mosek.com Conic optimization Linear cone problem: c T x minimize subject to Ax = b x K , with K = K 1 K 2 K p a product of proper


  1. Extending MOSEK with exponential cones ISMP Bordeaux 2018 joachim.dahl@mosek.com www.mosek.com

  2. Conic optimization Linear cone problem: c T x minimize subject to Ax = b x ∈ K , with K = K 1 × K 2 × · · · × K p a product of proper cones. Dual: b T y maximize c − A T y = s subject to s ∈ K ∗ , with K ∗ = K ∗ 1 × K ∗ 2 × · · · × K ∗ p .

  3. Symmetric cones (supported by MOSEK 8) • the nonnegative orthant l := { x ∈ R n | x j ≥ 0 , j = 1 , . . . , n } , K n • the quadratic cone q = { x ∈ R n | x 1 ≥ � � 1 / 2 } , x 2 2 + · · · + x 2 K n n • the rotated quadratic cone r = { x ∈ R n | 2 x 1 x 2 ≥ x 2 K n 3 + . . . x 2 n , x 1 , x 2 ≥ 0 } . • the semidefinite matrix cone s = { x ∈ R n ( n +1) / 2 | z T mat ( x ) z ≥ 0 , ∀ z } . K n

  4. Nonsymmetric cones (supported by MOSEK 9) • the three-dimensional power cone pow = { x ∈ R 3 | x α 1 x (1 − α ) K α ≥ | x 3 | , x 1 , x 2 > 0 } , 2 for 0 < α < 1. • the exponential cone K exp = cl { x ∈ R 3 | x 1 ≥ x 2 exp( x 3 / x 2 ) , x 2 > 0 } .

  5. Central path for conic problem Central path for homogenous model parametrized by µ : Ax µ − b τ µ = µ ( Ax − b τ ) s µ + A T y µ − c τ µ = µ ( s + A T y − c τ ) c T x µ − b T y µ + κ µ = µ ( c T x − b T y + κ ) s µ = − µ F ′ ( x µ ) , x µ = − µ F ′ ∗ ( s µ ) , κ µ τ µ = µ, or equivalently         0 A − b 0 y µ r p − A T  −  = µ 0 c x µ s µ r d       b T − c T 0 τ µ κ µ r g s µ = − µ F ′ ( x µ ) , x µ = − µ F ′ ∗ ( s µ ) , κ µ τ µ = µ, r d := c τ − A T y − s , r g := κ − c T x + b T y , r c := x T s + τκ. r p := Ax − b τ,

  6. Scaling for nonsymmetric cones cel [3] we consider a scaling W T W ≻ 0, Following Tun¸ x = W − T ˜ v = Wx = W − T s , v = W ˜ ˜ s x := − F ′ s := − F ′ ( x ). The centrality conditions where ˜ ∗ ( s ) and ˜ x = µ ˜ x , s = µ ˜ s can then be written symmetrically as v = µ ˜ v , and we linearize the centrality condition v = µ ˜ v as W ∆ x + W − T ∆ s = − v + µ ˜ v .

  7. A centering search-direction Let µ := x T s + τκ with barrier parameter ν and centering γ . ν + 1         0 A − b ∆ y c 0 r p − A T  −  = ( γ − 1) 0 c ∆ x c ∆ s c r d       b T − c T ∆ τ c ∆ κ c 0 r g W ∆ x c + W − T ∆ s c = γµ ˜ v − v , τ ∆ κ c + κ ∆ τ c = γµ − κτ, Constant decrease of residuals and complementarity: Ax + − b τ + = η · r p , c τ + − A T y + − s + = η · r d , b T y + − c T x + − κ + = η · r g , ( x + ) T s + + τ + κ + = η · r c , where z + := ( z + α ∆ z c ) and η = (1 − α (1 − γ )).

  8. A higher-order corrector term Derivatives of s µ = − µ F ′ ( x µ ): s µ + µ F ′′ ( x µ ) ˙ ˙ x µ = − F ′ ( x µ ) , s µ + µ F ′′ ( x µ )¨ x µ = − 2 F ′′ ( x µ ) ˙ x µ − µ F ′′′ ( x µ )[ ˙ ¨ x µ , ˙ x µ ] . Since s µ ) = x µ − [ F ′′ ( x µ )] − 1 ˙ x µ = − [ F ′′ ( x µ )] − 1 ( F ′ ( x µ ) + ˙ µ ˙ s µ , we have x µ , ( F ′′ ( x µ )) − 1 ˙ µ F ′′′ ( x µ )[ ˙ x µ ] = F ′′′ ( x µ )[ ˙ − F ′′′ ( x µ )[ ˙ x µ , ˙ x µ , x µ ] s µ ] � �� � − 2 F ′′ ( x µ ) ˙ x µ so x µ , ( F ′′ ( x µ )) − 1 ˙ s µ + µ F ′′ ( x µ )¨ ¨ x µ = F ′′′ ( x µ )[ ˙ s µ ] .

  9. An affine search-direction Affine search-direction:         0 − b A ∆ y a 0 r p  −  = − − A T 0 c ∆ x a ∆ s a r d       b T − c T ∆ τ a ∆ κ a r g 0 W ∆ x a + W − T ∆ s a = − v , τ ∆ κ a + κ ∆ τ a = − κτ, satisfies (∆ x a ) T ∆ s a + ∆ τ a ∆ κ a = 0 . Since s µ + µ F ′′ ( x µ ) ˙ ˙ x µ = − F ′ ( x µ ) = s µ , we interpret ∆ s a = − ˙ s µ and ∆ x a = − ˙ x µ .

  10. A higher-order corrector term From x µ , ( F ′′ ( x µ )) − 1 ˙ s µ + µ F ′′ ( x µ )¨ x µ = F ′′′ ( x µ )[ ˙ ¨ s µ ] we define a corrector direction as W ∆ x cor + W − T ∆ s cor = 1 2 W − T F ′′′ ( x )[∆ x a , ( F ′′ ( x )) − 1 ∆ s a ] . Note that s T ∆ x cor + x T ∆ s cor = 1 2 x T F ′′′ ( x )[∆ x a , ( F ′′ ( x )) − 1 ∆ s a ] = − (∆ x a ) T ∆ s a , condition for constant decrease of complementarity.

  11. A higher-order corrector term • Linear case, 1 2 F ′′′ ( x )[∆ x a , ( F ′′ ( x )) − 1 ∆ s a ] = − diag ( x ) − 1 diag (∆ x a )∆ s a , • Semidefinite case, 1 2 F ′′′ ( x )[∆ x a , ( F ′′ ( x )) − 1 ∆ s a ] = − 1 2 x − 1 ∆ x a ∆ s a − 1 2∆ s a ∆ x a x − 1 = − ( x − 1 ) ◦ (∆ x a ∆ s a ) , • Second-order cone case, 2 x T Qx ( ux T Q + Qxu T − ( x T u ) Q ) F ′′′ ( x )[( F ′′ ( x )) − 1 u ] = − for Q = diag (1 , − 1 , . . . , − 1). Then F ′′′ ( x )[( F ′′ ( x )) − 1 u ] e = − 2( x − 1 ◦ u ) .

  12. Combined centering-corrector direction A combined centering-corrector direction:         0 A − b ∆ y 0 r p − A T  −  = ( γ − 1) 0 c ∆ x ∆ s r d       b T − c T ∆ τ ∆ κ 0 r g v − v + 1 W ∆ x + W − 1 ∆ s = γµ ˜ 2 W − T F ′′′ ( x )[∆ x a , ( F ′′ ( x )) − 1 ∆ s a ] , τ ∆ κ + κ ∆ τ = γµ − τκ − ∆ τ a ∆ κ a . All residuals and complementarity decrease by η .

  13. Computing the scaling matrix Theorem (Schnabel [2]) Let S , Y ∈ R n × p have full rank p. Then there exists H ≻ 0 such that HS = Y if and only if Y T S ≻ 0 . As a consequence H = Y ( Y T S ) − 1 Y T + ZZ T where S T Z = 0, rank ( Z ) = n − p . We have n = 3, p = 2 and � � � � S := x ˜ x , Y := s ˜ s , with det( Y T S ) = ν 2 ( µ ˜ µ − 1) ≥ 0 vanishing only on the central path.

  14. Computing the scaling matrix Any scaling with n = 3 satisfies W T W = Y ( Y T S ) − 1 Y T + zz T � T z = 0, z � = 0. Expanding the BFGS update [2] � where x x ˜ H + = H + Y ( Y T S ) − 1 Y T − HS ( S T HS ) − 1 S T H , for H ≻ 0 gives the scaling by Tun¸ cel [3] and Myklebust [1], i.e. , zz T = H − HS ( S T HS ) − 1 S T H , with H = µ F ′′ ( x ).

  15. A negative result on complexity Nesterov’s long-step Hessian estimation property holds if F ′′′ ( x )[ u ] � 0 , ∀ x ∈ int ( K ) , ∀ u ∈ K . We have F ′′′ ([1; 1; − 1])[ u ] =  − 9 6 3   6 − 5 − 3   3 − 3 − 2   + u 2  + u 3 6 − 5 − 3 − 5 2 3 − 3 − 3 2 u 1  .    3 − 3 − 2 − 3 3 2 − 2 2 2 Not negative semidefinite for all u ∈ K .

  16. Comparing MOSEK and ECOS conic solvers 400 MOSEK (2) MOSEK w/o corr (29) ECOS (41) 300 iterations 200 100 0 0 50 100 150 200 prob instance Iteration counts for different exponential cone problems. Failures marked with ⋄ .

  17. Comparing MOSEK and ECOS conic solvers 10 3 MOSEK w/o corr (29) ECOS (41) 10 2 time other 10 1 10 0 10 - 1 10 - 1 10 0 10 1 10 2 10 3 time MOSEK Solution time for different exponential cone problems. Failures marked with ⋄ .

  18. Comparing MOSEK and ECOS conic solvers feasibility measure 10 0 MOSEK 10 - 2 MOSEK w/o corr ECOS 10 - 4 largest error 10 - 6 10 - 8 10 - 10 10 - 12 50 100 150 200 prob instance Feasibility measures for different exponential cone problems.

  19. Comparing MOSEK conic and MOSEK GP solvers 400 conic (3) GP primal (14) GP dual (2) 300 iterations 200 100 0 0 25 50 75 100 125 prob instance Iteration counts for different GPs. Failures marked with ⋄ .

  20. Comparing MOSEK conic and MOSEK GP solvers 10 4 GP primal (12) GP dual (2) 10 3 10 2 time GP 10 1 10 0 10 - 1 10 - 1 10 0 10 1 10 2 10 3 10 4 time conic Solution time for different GPs. Failures marked with ⋄ .

  21. Comparing MOSEK conic and MOSEK GP solvers feasibility measure 10 0 conic 10 - 2 GP primal GP dual 10 - 4 largest error 10 - 6 10 - 8 10 - 10 10 - 12 25 50 75 100 125 prob instance Feasibility measures for different GPs.

  22. References [1] T. Myklebust and L. Tun¸ cel. Interior-point algorithms for convex optimization based on primal-dual metrics. Technical report, 2014. [2] R. B. Schnabel. Quasi-newton methods using multiple secant equations. Technical report, Colorado Univ., Boulder, Dept. Comp. Sci., 1983. [3] L. Tun¸ cel. Generalization of primal-dual interior-point methods to convex optimization problems in conic form. Foundations of Computational Mathematics , 1:229–254, 2001.

Recommend


More recommend