A practical primal-dual interior-point algorithm for nonsymmetric conic optimization September 8, 2020 Erling D. Andersen (joint work with Joachim Dahl) MOSEK ApS, Email: e.d.andersen@mosek.com Personal WWW: https://erling.andersens.name Company WWW: https://mosek.com www.mosek.com
Outline Conic optimization The problem The two nonsymmetric cones A primal-dual interior-point algorithm Survey of algorithms Preliminaries Motivation The algorithm Computational results Summary 1 / 47
Section 1 Conic optimization
Generic conic optimization problem Primal form ( c k ) T x k � minimize k � A k x k st = b, k x k ∈ K k , ∀ k, where • c k ∈ R n k , • A k ∈ R m × n k , • b ∈ R m , • K k are convex cones. 3 / 47
The 5 cones 3 symmatric cones: • Linear. • Quadratic. • Semidefinite. 2 nonsymmetric cones: • Exponential . • Power . Observation: • Almost all convex optimization problems appearing in practice can be formulated using those 5 cones. 4 / 47
The power cone The power cone: n � n x | α j | � j =1 | α j | , x ≥ 0 K pow ( α ) := ≥ � z � ( x, z ) : . j j =1 Examples ( α ∈ (0 , 1) ): t ≥ | x | 1 /α , t ≥ 0 , ( t, 1 , x ) ∈ K pow ( α, 1 − α ) ⇔ x α ≥ | t | , x ≥ 0 , ( x, 1 , t ) ∈ K pow ( α, 1 − α ) ⇔ 1 /n n � ( x, t ) ∈ K pow ( e ) ⇔ x j ≥ | t | , x ≥ 0 . j =1 See also Chares [2] and the Mosek modelling cookbook [5]. 5 / 47
The exponential cone The exponential cone x 3 x 2 , x 2 ≥ 0 } K exp := { ( x 1 , x 2 , x 3 ) : x 1 ≥ x 2 e ∪{ ( x 1 , x 2 , x 3 ) : x 1 ≥ 0 , x 2 = 0 , x 3 ≤ 0 } Applications: t ≥ e x , ( t, 1 , x ) ∈ K exp ⇔ t ≥ a x , ( t, 1 , ln( a ) x ) ∈ K exp ⇔ ( x, 1 , t ) ∈ K exp ⇔ t ≤ ln( x ) , (1 , x, t ) ∈ K exp ⇔ t ≤ − x ln( x ) , ( y, x, − t ) ∈ K exp ⇔ t ≥ x ln( x/y ) , (relative entropy) . Geometric programming + many more [2, 5]. 6 / 47
Section 2 A primal-dual interior-point algorithm
Survey • Lesson learned from the linear case: Solve the primal and dual problem simultaneously. • Symmetric cones: Employ the Nesterov-Todd (NT) algorithm [10, 12]. • Nonsymmetric cones: How to generalize the NT algorithm? • Nesterov [8, 9], Skajaa and Ye [15], Serrano [14]: Computational results available • Tuncel [16], Myklebust [6], Tuncel and Myklebust [7]: No computational results. Present work: • Follows Myklebust and Tuncel. 8 / 47
Primal and dual problem Primal problem c T x minimize st Ax = b, x ∈ K and the dual b T y maximize A T y + s st = c, s ∈ ( K ) ∗ where K = K 1 × K 2 × · · · K k and K ∗ is the corresponding dual cone. Known for the 5 cone types. 9 / 47
Barrier functions Define a 3 times differentiable function F such that F : int ( K ) �→ R then it is a ν -logarithmically homogeneouos self-concordant barrier ( ν -LHSCB) for int ( K ) if | F ′′′ ( x )[ u, u, u ] | ≤ 2( F ′′ ( x )[ u, u ]) 3 / 2 and F ( τx ) = F ( x ) − ν log τ. See [10, 12]. 10 / 47
The dual barrier If F is a ν -self-concordant barrier for K , then the Fenchel conjugate F ∗ ( s ) = sup {−� s, x � − F ( x ) } . (1) x ∈ int ( K ) is a ν -self-concordant barrier for K ∗ . Let µ := � x, s � µ := � ˜ s � x, ˜ x := − F ′ s := − F ′ ( x ) , ˜ ∗ ( s ) , ˜ , ˜ . ν ν s ∈ int ( K ∗ ) and x ∈ int ( K ) , ˜ Then ˜ µ ˜ µ ≥ 1 (2) with equality iff x = − µ ˜ x (and s = µ ˜ s ). 11 / 47
The homogeneous model Generalized Goldman-Tucker homogeneous model: ( H ) Ax − bτ = 0 , A T y + s − cτ = 0 , − c T x + b T y − κ = 0 , ( x ; τ ) ∈ ¯ K , ( s ; κ ) ∈ ¯ K ∗ where K ∗ := K ∗ × R + . ¯ ¯ K := K × R + and • K is Cartesian product of k + 1 convex cones. • The homogeneous model always has a solution. • Partial list of references: • Linear case: [4], [3], [17]. • Nonlinear case: [11]. 12 / 47
Investigating the homogeneous model Lemma Let ( x ∗ , τ ∗ , y ∗ , s ∗ , κ ∗ ) be any feasible solution to (H), then i) ( x ∗ ) T s ∗ + τ ∗ κ ∗ = 0 . ii) If τ ∗ > 0 , then ( x ∗ , y ∗ , s ∗ ) /τ ∗ is an optimal solution. iii) If κ ∗ > 0 , then at least one of the strict inequalities b T y ∗ > 0 (3) and c T x ∗ < 0 (4) holds. If the first inequality holds, then ( P ) is infeasible. If the second inequality holds, then ( D ) is infeasible. 13 / 47
The central path The central path: Ax − bτ x − b ˆ = γ ( A ˆ τ ) , A T y + s − cτ γ ( A T ˆ s − c ˆ = y + ˆ τ ) , − c T x + b T y − κ γ ( − c T ˆ x + b T ˆ y − ˆ = κ ) , µF ′ ( x ) s + γ ˆ = 0 , τκ − γ ˆ µ = 0 , where x ) T ˆ µ := (ˆ s + ˆ τ ˆ κ ˆ ν + 1 and (ˆ x, ˆ τ, ˆ y, ˆ s, ˆ κ ) is an “interior” solution for γ = 1 . The central path is the solutions parameterised by γ ∈ [0 , 1] . 14 / 47
Tracing the central path • Idea: Trace the central path using Newton’s method. • Question: Should we use the primal or dual barrier i.e. µF ′ ( x ) = s + γ ˆ s + γ ˆ µ ˜ s = 0 or µF ′ x + γ ˆ ∗ ( s ) = x + γ ˆ µ ˜ x = 0 where x := − F ′ s := − F ′ ( x ) . ˜ ∗ ( s ) and ˜ 15 / 47
Primal-dual scaling A nonsingular matrix W is called a primal-dual scaling if it satisfies W − T s, v := Wx = W − T ˜ v ˜ := W ˜ x = s. The primal or dual centrality conditions are equivalent to v = γ ˆ µ ˜ v. • Result: The centrality conditions have become symmetric! 16 / 47
The search direction Affine direction: − ( Ax 0 − bτ 0 ) , Ad a x − bd a = τ A T d a − ( A T y 0 + s 0 − cτ 0 ) , y + d a s − cd a = τ − c T d a x + b T d a − ( − c T x 0 + b T y 0 − κ 0 ) , y − d a = κ x + W − T d a Wd a − v 0 , = s τ 0 d a τ + κ 0 d a − τ 0 κ 0 . = τ Centering direction: ( Ax 0 − bτ 0 ) , Ad c x − bd c = τ A T d c ( A T y 0 + s 0 − cτ 0 ) , y + d a s − cd c = τ − c T d c x + b T d c ( − c T x 0 + b T y 0 − κ 0 ) , y − d c = κ x + W − T d c Wd c µ 0 ˜ v 0 , = s τ 0 d c τ + κ 0 d c µ 0 . = τ 17 / 47
Updating the solution For a given γ ∈ [0 , 1] then define d a x + γd c d x := x , d a τ + γd c d τ := τ , d a y + γd c d y := y , d a s + γd c d s := s , d a κ + γd c d κ := κ , and hence for a step size α ∈ [0 , 1] we have x 0 + αd x , x + := τ 0 + αd τ , τ + := y 0 + αd y , y + := s 0 + αd s , s + := κ 0 + αd κ . κ + := 18 / 47
Basic but important properties Ax + − bτ + (1 − α (1 − γ ))( Ax 0 − bτ 0 ) , = A T y + + s + − cτ + (1 − α (1 − γ ))( A T y 0 + s 0 − cτ 0 ) , = − c T x + + b T y + − κ + (1 − α (1 − γ ))( − c T x 0 + b T y 0 − κ 0 ) , = ( x + ) T ( s + ) + τ + κ + (1 − α (1 − γ ))(( x 0 ) T s 0 + τ 0 κ 0 ) . = • Equal decrease in infeasibility and complementarity for γ ∈ [0 , 1) . • If α ∈ ]0 , 1] , then “convergence”. • No merit function is needed. Yahooooo! 19 / 47
Choice of the primal-dual scaling Our method inspired by (Tuncel, Tuncel and Myklebust): W T W µ 0 F ′′ ( x 0 ) , ≈ W − T s, Wx = W − T ˜ W ˜ x = s. Employ the quasi Newton idea to compute W . 20 / 47
Computing the scaling matrix Theorem (Schnabel [13]) S ∈ R n × p have full rank p . Then there exists H ≻ 0 such Let ¯ X, ¯ S T ¯ that H ¯ X = ¯ S if and only if ¯ X ≻ 0 . As a consequence S T ¯ X ) − 1 ¯ S T + ZZ T H = ¯ S ( ¯ X T Z = 0 , rank ( Z ) = n − p . We have n = 3 , p = 2 and where ¯ ¯ ¯ � � � � X := x x ˜ , S := s s ˜ , with S T ¯ det( ¯ X ) = ν 2 ( µ ˜ µ − 1) ≥ 0 vanishing only on the central path. 21 / 47
Computing the scaling matrix Any scaling with n = 3 satisfies W T W = ¯ S T ¯ X ) − 1 ¯ S T + zz T S ( ¯ � T z = 0 , z � = 0 . Expanding the BFGS update [13] � where x ˜ x H + = H + ¯ S T ¯ X ) − 1 ¯ S T − H ¯ X T H ¯ X ) − 1 ¯ X T H, S ( ¯ X ( ¯ for H ≻ 0 gives the scaling by Tun¸ cel [16] and Myklebust [7], i.e., zz T = H − H ¯ X T H ¯ X ) − 1 ¯ X T H, X ( ¯ with H = µF ′′ ( x ) . 22 / 47
A high-order corrector term A high-order correction: Ad co x − bd co = 0 , τ A T d co y + d co s − cd co = 0 , τ − c T d co x + b T d co y − d co = 0 , κ − 1 x + W − T d co 2 W − T F ′′′ ( x )[ d a Wd co x , F ′′ ( x ) − 1 d a = s ] , s τ 0 d co κ + κ 0 d co − d a τ d a = κ . τ For motivation see paper. Finally d a x + γd c x + d co d x := x , d a τ + γd c τ + d co d τ := τ , d a y + γd c y + d co d y := y , d a s + γd c s + d co d s := s , d a κ + γd c κ + d co d κ := κ . 23 / 47
The power cone case A 3-self-concordant barrier for the 3 dimensional primal power cone: F ( x ) = − log( x 2 α 1 x 2 − 2 α − x 2 3 ) − (1 − α ) log x 1 − α log x 2 . (5) 2 suggest by Chares [2]. Generalized in [1]. Is self-dual using redefined inner product. However, • The conjugate barrier F ∗ ( x ) or its derivatives cannot be evaluated on closed-form. • Can be evaluated numerically to high accuracy based of an idea of Nesterov. 24 / 47
Recommend
More recommend