Semidefinite Optimization using MOSEK Joachim Dahl ISMP Berlin, August 23, 2012 http://www.mosek.com
Introduction Introduction ■ MOSEK is a state-of-the-art solver for large-scale linear Semidefinite optimization and conic quadratic problems. Algorithms ■ Based on the homogeneous model using Nesterov-Todd Results and examples scaling. Summary ■ Next version includes semidefinite optimization . ■ Goal for first version is to beat SeDuMi. Why SDP in MOSEK? ■ It is very powerful and flexible for modeling. ■ We want the SDP work you developed to be available to our customers! ■ Customers have been asking about it for awhile. 2 / 28
Semidefinite optimization 3 / 28
Semidefinite optimization ■ Primal and dual problems: � p b T y minimize j =1 C j • X j maximize � p � m subject to j =1 A ij • X j = b i subject to i =1 y i A ij + S j = C j S j ∈ S n j X j ∈ S n j + , + , where A ij , C j ∈ S n j . ■ Optimality conditions: p m � � X j , S j ∈ S n j A ij • X j = b i , y i A ij + S j = C j , X j S j = 0 , + . j =1 i =1 ■ Equivalent complementary conditions if X j , S j ∈ S n j + : X j • S j = 0 ⇐ ⇒ X j S j = 0 ⇐ ⇒ X j S j + S j X j = 0 . 4 / 28
Algorithms 5 / 28
Notation For symmetric matrices U ∈ S n we define √ √ √ √ 2 U n 2 , . . . , U nn ) T svec ( U ) = ( U 11 , 2 U 21 , . . . , 2 U n 1 , U 22 , 2 U 32 , . . . , with inverse operation √ √ u 1 u 2 / 2 · · · u n / 2 √ √ u 2 / 2 u n +1 · · · u 2 n − 1 / 2 smat ( u ) = , . . . . . . . . . √ √ u n / 2 u n − 1 / 2 · · · u n ( n +1) / 2 and a symmetric product u ◦ v := (1 / 2)( smat ( u ) smat ( v ) + smat ( v ) smat ( u )) . 6 / 28
Notation Let a ij = svec ( A ij ) , c j := svec ( C j ) , x j := svec ( X j ) , s j := svec ( S j ) , and a T a T . . . c 1 x 1 s 1 11 1 p . . . . . . . . . . A := , c := , x := , s := . . . . . . a T a T . . . c p x p s p m 1 mp We then have primal and dual problems: c T x b T y minimize maximize A T y + s = c subject to Ax = b subject to x � 0 , s � 0 . 7 / 28
The homogeneous model Simplified homogeneous model: A T s 0 − c x 0 + = 0 A 0 − b y 0 x, s � 0 , τ, κ ≥ 0 . c T − b T κ 0 τ 0 Note that ( x, y, s, κ, τ ) = 0 is feasible, and x T s + κτ = 0 . ■ If τ > 0 , κ = 0 then ( x, y, s ) /τ is a primal-dual optimal solution, A T y + s = cτ, x T s = 0 . Ax = bτ, ■ If κ > 0 , τ = 0 then either primal or dual is infeasible, A T y + s = 0 , c T x − b T y < 0 , Ax = 0 , Primal infeasible if b T y > 0 , dual infeasible if c T x < 0 . 8 / 28
The central path Introduction We define the central path as: Semidefinite optimization A T y (0) + s (0) − cτ (0) A T s 0 − c x Algorithms Ax (0) − bτ (0) = γ 0 + A 0 − b y Notation Homogeneous c T x (0) − b T y (0) + κ (0) c T − b T κ 0 τ model The central path Primal-dual scaling x ◦ s = γµ (0) e, τκ = γµ (0) , The search direction Schur complement where γ ∈ [0 , 1] , µ (0) = ( x (0) ) T s (0) + κ (0) τ (0) Practical details > 0 and n +1 Results and examples Summary svec ( I n 1 ) . . e = . . svec ( I n p ) 9 / 28
Primal-dual scaling Introduction A primal-dual scaling is defined by a non-singular R k as Semidefinite optimization W k ( x k ) := svec ( R − T X k R − 1 W − T ( s k ) := svec ( R k S k R T k ) , k ) , k k Algorithms Notation Homogeneous model ■ preserves the semidefinite cone, The central path Primal-dual scaling The search direction W k ( x k ) ≻ 0 , W − T x k ≻ 0 , s k ≻ 0 ⇐ ⇒ ( s k ) ≻ 0 , Schur complement k Practical details Results and examples Summary ■ the central path, W k ( x k ) ◦ W − T x k ◦ s k = γµe k ⇐ ⇒ ( s k ) = γµe k , k k s k = W k ( x k ) T W − T ■ and the complementarity: x T ( s k ) . k 10 / 28
Two popular primal-dual scalings Introduction = S 1 / 2 ■ Helmberg-Kojima-Monteiro (HKM) R − 1 : Semidefinite k k optimization Algorithms W k ( x k ) = svec ( S 1 / 2 X k S 1 / 2 W − T ) , ( s k ) = e k . Notation k k k Homogeneous model The central path Primal-dual scaling The search direction ■ Nesterov-Todd (NT): Schur complement Practical details � � 1 / 2 Results and S − 1 / 2 ( S 1 / 2 X k S 1 / 2 ) 1 / 2 S − 1 / 2 R − 1 = , examples k k k k k Summary Satisfies R − 1 k X k R − 1 = R k S k R k . Computed as: k = Λ 1 / 4 Q T L − 1 , LL T = X k , Q Λ Q T = L T S k L R − 1 k which satisfies R − T X k R − 1 = R k S k R T k = Λ k . k k 11 / 28
The search direction Introduction Linearizing the scaled centrality condition Semidefinite optimization W k ( x k ) ◦ W − T ( s k ) = γµe k k Algorithms Notation Homogeneous and discarding higher-order terms leads to model The central path Primal-dual scaling ∆ x k + Π k ∆ s k = γµs − 1 − x k , The search direction k Schur complement Practical details where Results and examples ■ HKM scaling: Π k z k = svec ( X k Z k S − 1 + S − 1 k Z k X k ) / 2 , k Summary ■ NT scaling: Π k z k = svec ( R T k R k Z k R T k R k ) . 12 / 28
The search direction Most expensive part of interior-point optimizer is solving: A ∆ x − b ∆ τ = ( γ − 1)( Ax − bτ ) = r 1 A T ∆ y + ∆ s − c ∆ τ = ( γ − 1)( A T y + s − cτ ) = r 2 c T ∆ x − b T ∆ y + ∆ κ = ( γ − 1)( c T x − b T y + κ ) = r 3 ∆ x + Π∆ s = γµs − 1 − x = r 4 ∆ κ + κ/τ ∆ τ = γµ/τ − κ = r 5 Gives constant decrease in residuals and complementarity: A ( x + α ∆ x ) − b ( τ + α ∆ τ ) = (1 − α (1 − γ ))( Ax − bτ ) ( x + α ∆ x ) T ( s + α ∆ s ) + ( κ + α ∆ κ )( τ + α ∆ τ ) = (1 − α (1 − γ )) ( x T s + κτ ) . � �� � < 1 Polynomial complexity for γ chosen properly. 13 / 28
Solving for the search direction After block-elimination, we get a 2 × 2 system: A Π A T ∆ y − ( A Π c + b )∆ τ = r y ( A Π c − b )∆ y + ( c T Π c + κ/τ )∆ τ = r τ . We factor A Π A T = LL T and eliminate ∆ y = L − T L − 1 (( A Π c + b )∆ τ + r y ) . Then � ( A Π c − b ) T L − T L − 1 ( A Π c + b ) + ( c T Π c + κ/τ ) � ∆ τ = r τ − ( A Π c − b ) T L − T L − 1 r y . An extra (insignificant) solve compared to an infeasible method. 14 / 28
Forming the Schur complement Schur complement computed as a sum of outer products: a T a T . . . Π 1 a 11 . . . a m 1 11 1 p . . . . ... A Π A T . . . . = . . . . a T a T Π p a 1 p . . . a mp . . . mp m 1 p � � � P T P k A : ,k Π k A T : ,k P T = P k , k k k =1 where P k is a permutation of A : ,k . Complexity depends on choice of P k . We compute only tril ( A Π A T ) , so by nnz ( P k A : ,k ) 1 ≥ nnz ( P k A : ,k ) 2 ≥ · · · ≥ nnz ( P k A : ,k ) p , get a good reduction in complexity. More general than SDPA. 15 / 28
Forming the Schur complement To evaluate each term a T ik Π k a jk = ( R T k A ik R k ) • ( R T k A jk R k ) = A ik • ( R k R T k A jk R k R T k ) . we follow SeDuMi. Rewrite MM + M T ˆ M T , k = ˆ R k R T k A jk R k R T i.e., a symmetric (low-rank) product. MM + M T ˆ M T element by element. ■ A ik sparse: evaluate ˆ MM + M T ˆ M T using level 3 BLAS. ■ A ik dense: evaluate ˆ May exploit low-rank structure later. 16 / 28
Practical details Introduction Additionally, the MOSEK solver Semidefinite optimization ■ uses Mehrotra’s predictor-corrector step. Algorithms Notation ■ solves mixed linear, conic quadratic and semidefinite Homogeneous model problems. The central path Primal-dual scaling The search direction ■ employs a presolve step for linear variables, which can Schur complement reduce problem size and complexity significantly. Practical details Results and examples ■ scales constraints trying to improving conditioning of a Summary problem. 17 / 28
Results and examples 18 / 28
SDPLIB benchmark Introduction ■ We use the subset of SDPLIB used by H. D. Mittelmann Semidefinite optimization for his SDP solver comparison. Algorithms ■ For brevity we report only iteration count, computation Results and examples time, and relative gap at termination. SDPLIB benchmark Profiling Conic modeling ■ Our SDPA-format converter detects block-diagonal Summary structure within the semidefinite blocks (SeDuMi and SDPA does not), so for a few problems our solution time is much smaller. ■ All solvers run on a single thread on an Intel Xeon E32270 with 4 cores at 3.4GHz and 32GB memory using Linux. 19 / 28
Recommend
More recommend