A primal-dual algorithm for expontial-cone optimization ICCOPT Berlin, August 8th, 2019 joachim.dahl@mosek.com www.mosek.com
Conic optimization Linear cone problem: c T x minimize subject to Ax = b x ∈ K , with K = K 1 × K 2 × · · · × K p a product of proper cones. Dual: b T y maximize c − A T y = s subject to s ∈ K ∗ , with K ∗ = K ∗ 1 × K ∗ 2 × · · · × K ∗ p . 1 / 19
Conic optimization MOSEK 9 supports the following symmetric cones, • linear, quadratic and semidefinite cones and the nonsymmetric cones, • three-dimensional power cone for 0 < α < 1, pow = { x ∈ R 3 | x α 1 x (1 − α ) K α ≥ | x 3 | , x 1 , x 2 > 0 } , 2 • exponential cone K exp = cl { x ∈ R 3 | x 1 ≥ x 2 exp( x 3 / x 2 ) , x 2 > 0 } . 2 / 19
Self-concordant barriers Self-concordant barrier for K exp : F ( x ) = − log( x 2 log( x 1 / x 2 ) − x 3 ) − log x 1 − log x 2 . Conjugate barrier: F ∗ ( s ) = max {−� x , s � − F ( x ) : x ∈ int ( K ) } . Standard properties: F ( k ) ( τ x ) = 1 τ k F ( k ) ( x ) F ( k ) ( x )[ x ] = − kF ( k − 1) ( x ) − F ′ ( x ) ∈ int ( K ∗ ) − F ′ ∗ ( s ) ∈ int ( K ) ∗ ( s )] − 1 F ′ ( − F ′ ∗ ( s )) = − s F ′′ ( − F ′ ∗ ( s )) = [ F ′′ 3 / 19
Central path for conic problem Central path for homogenous model parametrized by µ : Ax µ − b τ µ = µ ( Ax − b τ ) s µ + A T y µ − c τ µ = µ ( s + A T y − c τ ) c T x µ − b T y µ + κ µ = µ ( c T x − b T y + κ ) s µ = − µ F ′ ( x µ ) , x µ = − µ F ′ ∗ ( s µ ) , κ µ τ µ = µ, or equivalently 0 A − b 0 y µ r p − A T − = µ 0 c x µ s µ r d b T − c T 0 τ µ κ µ r g s µ = − µ F ′ ( x µ ) , x µ = − µ F ′ ∗ ( s µ ) , κ µ τ µ = µ, r d := c τ − A T y − s , r g := κ − c T x + b T y , r c := x T s + τκ. r p := Ax − b τ, 4 / 19
Scaling for nonsymmetric cones cel [5] we consider a scaling W T W ≻ 0, Following Tun¸ x = W − T ˜ v = Wx = W − T s , v = W ˜ ˜ s x := − F ′ s := − F ′ ( x ). The centrality conditions where ˜ ∗ ( s ) and ˜ x = µ ˜ x , s = µ ˜ s can then be written symmetrically as v = µ ˜ v , and we linearize the centrality condition v = µ ˜ v as W ∆ x + W − T ∆ s = µ ˜ v − v . 5 / 19
An affine search-direction 0 − b A ∆ y a 0 r p − A T − = − 0 ∆ x a ∆ s a c r d b T − c T 0 ∆ τ a ∆ κ a r g ∆ s a + W T W ∆ x a = − s , τ ∆ κ a + κ ∆ τ a = − κτ, satisfying (∆ x a ) T ∆ s a + ∆ τ a ∆ κ a = 0 . Let α a ∈ (0 , 1] denote largest feasible step in the affine direction. We estimate a centering parameter as γ := (1 − α a ) min { (1 − α a ) 2 , 1 / 4 } . 6 / 19
A centering search-direction Let µ = ( x T s + τκ ) / ( ν + 1). 0 A − b ∆ y c 0 r p − A T − = ( γ − 1) 0 c ∆ x c ∆ s c r d b T − c T ∆ τ c ∆ κ c 0 r g W ∆ x c + W − T ∆ s c = γµ ˜ v − v , τ ∆ κ c + κ ∆ τ c = γµ − κτ, Constant decrease of residuals and complementarity: Ax + − b τ + = (1 − α (1 − γ )) · r p , c τ + − A T y + − s + = (1 − α (1 − γ )) · r d , b T y + − c T x + − κ + = (1 − α (1 − γ )) · r g , ( x + ) T s + + τ + κ + = (1 − α (1 − γ )) · r c , where z + := ( z + α ∆ z c ). 7 / 19
A higher-order corrector term Derivatives of s µ = − µ F ′ ( x µ ): s µ + µ F ′′ ( x µ ) ˙ x µ = − F ′ ( x µ ) , ˙ s µ + µ F ′′ ( x µ )¨ x µ = − 2 F ′′ ( x µ ) ˙ x µ − µ F ′′′ ( x µ )[ ˙ ¨ x µ , ˙ x µ ] . Using F ′′ ( x ) x = − F ′ ( x ) and F ′′′ ( x )[ x ] = − 2 F ′′ ( x ) we obtain x µ , ( F ′′ ( x µ )) − 1 ˙ s µ + µ F ′′ ( x µ )¨ x µ = F ′′′ ( x µ )[ ˙ ¨ s µ ] . We interpret ˙ s µ ≈ − µ ∆ s a and ˙ x µ ≈ − µ ∆ x a , i.e. , ∆ s cor + W T W ∆ x cor = 1 2 F ′′′ ( x )[∆ x a , ( F ′′ ( x )) − 1 ∆ s a ] , satisfying x T ∆ s cor + s T ∆ x cor = − (∆ x a ) T ∆ s a . 8 / 19
Combined centering-corrector direction A combined centering-corrector direction: 0 A − b ∆ y 0 r p − = ( γ − 1) − A T 0 c ∆ x ∆ s r d b T − c T ∆ τ ∆ κ 0 r g v − v + 1 2 W − T F ′′′ ( x )[∆ x a , ( F ′′ ( x )) − 1 ∆ s a ] , W ∆ x + W − T ∆ s = γµ ˜ τ ∆ κ + κ ∆ τ = γµ − τκ − ∆ τ a ∆ κ a . All residuals and complementarity decrease by (1 − α (1 − γ )). 9 / 19
Computing the scaling matrix Theorem (Schnabel [4]) Let S , Y ∈ R n × p have full rank p. Then there exists H ≻ 0 such that HS = Y if and only if Y T S ≻ 0 . Let � � � � S := x x ˜ , Y := s s ˜ both be full rank. As a consequence of Thm. 1 (for n = 3), H = Y ( Y T S ) − 1 Y T + zz T where S T z = 0, z � = 0 and � x T ˜ s ) − ν 2 � det( Y T S ) = ( x T s ) · (˜ > 0 vanishing towards the central path. 10 / 19
Computing the scaling matrix Expanding the BFGS update [4] H = H 0 + Y ( Y T S ) − 1 Y T − H 0 S ( S T H 0 S ) − 1 S T H 0 , ˆ for H 0 ≻ 0 gives the scaling by Tun¸ cel [5] and Myklebust [2], i.e. , z T = H 0 − H 0 S ( S T H 0 S ) − 1 S T H 0 . z ˆ ˆ We choose H 0 := µ F ′′ ( x ). In other words, W T W = ˆ H ≈ µ F ′′ ( x ) and satisfies W T Wx = s , W T W ˜ x = ˜ s . 11 / 19
Tun¸ cel’s scaling bounds x T ˜ Let µ := ( x T s ) /ν and ˜ µ := (˜ s ) /ν . Tun¸ cel defines � T 2 ( ξ, x , s ) := H ≻ 0 | Hx = s , H ˜ x = ˜ s , � µ − 1) + 1) F ′′ ( x ) � H � ξ ( ν ( µ ˜ µ µ − 1) + 1) F ′′ (˜ x ) ξ ( ν ( µ ˜ µ and shows polynomial convergence for a potential reduction method if ∀ x ∈ int ( K ) , s ∈ int ( K ∗ ) . inf ξ T 2 ( ξ, x , s ) ≤ O (1) , For symmetric cones ξ ⋆ ≤ 4 / 3. 12 / 19
Bounds for the exponential cone Given s ∈ int ( K ∗ exp ) and µ > 0. Let h := (0 , 0 , νµ/ s 3 ) and x α := h − α ( µ F ′ ( s ) + h ) . 1 x α ∈ K exp , α ∈ [0 , ν/ 2]. 2 � x α , s � = µ . ν ∗ ( s ) � = ν − 1 1 3 µ � F ′ ( x α ) , F ′ + ν − ( ν − 1) α . α ∗ ( s ) = ( α 2 − 2 α ) ν ( ν − 1) + ν 2 . 4 � x α � 2 − µ F ′ Conjecture (Øbro [3]): For the exponential cone ξ ⋆ ≈ 1 . 2532, i.e. , � 2 ν � − 1 � − 1 � 2 √ ν ( ν − 1) 3 / 2 1 ξ ⋆ = + − ν + 1 ν − 1 − √ ν − 1 √ ν � ν ( ν − 1) ν − attained for x α ⋆ with α ⋆ = ν ( ν ( ν − 1)) − 1 / 2 . 13 / 19
Øbro’s conjecture 4 3 x 1 2 1 0 0 . 0 2 1 0 . 5 0 1 . 0 − 1 x 2 x 3 1 . 5 − 2 2 . 0 − 3 Plot of K exp ∩ { x : x T s = νµ } , D ( − µ F ′ ∗ ( s ) , 1) and x α ⋆ (red). 14 / 19
Implications for the exponential-cone • F ( x ) does not have negative curvature, i.e. , F ′′′ ( x )[ u ] �� 0 , ∀ x ∈ int ( K exp ) , ∀ u ∈ K exp . • But F ′′ is still bounded, for another reason. • Tun¸ cel’s potential-reduction method for expontial-cones have polynomial-time complexity. • No equivalent proof yet for MOSEK’s algorithm, even with optimal scalings. • The BFGS scaling appears to be bounded as well, and often coincides with the optimal scaling, leaving more to be proved. 15 / 19
Comparing MOSEK and ECOS conic solvers MOSEK MOSEK n/c ECOS 300 iterations 200 100 0 0 50 100 150 problem index Iteration counts for different exponential cone problems, comparing MOSEK (with and without proposed corrector) and ECOS. 16 / 19
Comparing MOSEK and ECOS conic solvers 10 2 MOSEK MOSEK n/c ECOS 10 1 10 0 time [s] 10 - 1 10 - 2 10 - 3 0 50 100 150 problem index Solution time for different exponential cone problems, comparing MOSEK (with and without proposed corrector) and ECOS. 17 / 19
Conclusions • Exponential cone optimization included in MOSEK 9. • Works very well in practice, especially with the proposed corrector. • Solution-time, accuracy, number of iterations on level with symmetric cone implementation. • No proof of polynomial-time complexity yet. • More details can be found in [1]. 18 / 19
References [1] J. Dahl and E. D. Andersen. A primal-dual interior-point algorithm for nonsymmetric exponential-cone optimization. Technical report, MOSEK ApS., 2019. [2] T. Myklebust and L. Tun¸ cel. Interior-point algorithms for convex optimization based on primal-dual metrics. Technical report, University of Waterloo, 2014. [3] M. Øbro. Conic optimization with exponential cones. Master’s thesis, Technical University of Denmark, 2019. [4] R. B. Schnabel. Quasi-newton methods using multiple secant equations. Technical report, Colorado Univ., Boulder, Dept. Comp. Sci., 1983. [5] L. Tun¸ cel. Generalization of primal-dual interior-point methods to convex optimization problems in conic form. Foundations of Computational Mathematics , 1:229–254, 2001. 19 / 19
Recommend
More recommend