mixed integer conic optimization and mosek
play

Mixed-integer conic optimization and MOSEK Dagstuhl seminar on - PowerPoint PPT Presentation

Mixed-integer conic optimization and MOSEK Dagstuhl seminar on MINLP, February 20th 2018 Sven Wiese www.mosek.com What is MOSEK ? MOSEK ApS is a Danish company founded in 1997. Creates software for mathematical optimization problems.


  1. Mixed-integer conic optimization and MOSEK Dagstuhl seminar on MINLP, February 20th 2018 Sven Wiese www.mosek.com

  2. What is MOSEK ? • MOSEK ApS is a Danish company founded in 1997. • Creates software for mathematical optimization problems. convex NLP MIP MIP MIP conic-QP convex LP QP (SOCP) conic SDP MOSEK optimization Fusion power exponential optimizer cones cones APIs MIP MIP 1 / 30

  3. Linear optimization A special case of conic optimization The classical linear optimization problem: c T x minimize subject to Ax = b , x ≥ 0 . Pro: • Structure is explicit and simple. • Data is simple: c , A , b . • Structure implies convexity i.e. data independent. • Powerful duality theory including Farkas lemma. • Smoothness, gradients, Hessians are not an issue. Therefore, we have powerful algorithms and software. 2 / 30

  4. Linear optimization A special case of conic optimization Con: • It is linear only. 3 / 30

  5. The classical nonlinear optimization problem The classical nonlinear optimization problem: minimize f ( x ) subject to g ( x ) ≤ 0 . Pro • It is very general. Con: • Structure is hidden. • How to specify the problem at all in software? • How to compute gradients and Hessians if needed? • How to exploit structure? • Convexity checking! • Verifying convexity is NP-hard. • Solution: Disciplined convex modeling by Grant, Boyd and Ye [1] to assure convexity. 4 / 30

  6. A fundamental question Is there a class of nonlinear optimization problems that preserve almost all of the good properties of the linear optimization problem? 5 / 30

  7. Conic optimization Linear cone problem: c T x minimize subject to Ax = b x ∈ K , with K = K 1 × K 2 × · · · × K K a product of proper cones. 6 / 30

  8. The beauty of conic optimization • Separation of data and structure: • Data: c , A and b . • Structure: K . • Structural convexity. • Duality (almost...). • No issues with smoothness and differentiability. Lubin et al. [2] show that all convex instances (333) in MINLPLIB2 are conic representable using only 4 types of cones. 7 / 30

  9. Extremely disciplined convex programming These 4 cones, including symmetric and non-symmetric ones, and extended by another popular cone, are: conic-QP LP (SOCP) conic MOSEK SDP optimization power exponential cones cones Allowing for the nonsymmetric conic formulation leads to extremely disciplined convex programming . Simple, yet flexible for modeling, and with efficient numerical algorithms. 8 / 30

  10. Symmetric cones (supported by MOSEK 8) • the nonnegative orthant l := { x ∈ R n | x j ≥ 0 , j = 1 , . . . , n } , K n • the quadratic cone q = { x ∈ R n | x 1 ≥ � 1 / 2 } , K n x 2 2 + · · · + x 2 � n • the rotated quadratic cone r = { x ∈ R n | 2 x 1 x 2 ≥ x 2 3 + . . . x 2 K n n , x 1 , x 2 ≥ 0 } . • the semidefinite matrix cone s = { x ∈ R n ( n +1) / 2 | z T mat ( x ) z ≥ 0 , ∀ z } , K n √ √  x 2 / 2 . . . x n / 2  x 1 √ √ x 2 / 2 x n +1 . . . x 2 n − 1 / 2   with mat ( x ) :=  .   . . . . . .   . . .  √ √ x n / 2 x 2 n − 1 / 2 . . . x n ( n +1) / 2 9 / 30

  11. Examples of quadratic cones • Absolute value: ( t , x ) ∈ K 2 | x | ≤ t ⇐ ⇒ q . • Euclidean norm: ( t , x ) ∈ K n − 1 � x � 2 ≤ t ⇐ ⇒ , q • Second-order cone inequality: � Ax + b � 2 ≤ c T x + d ( c T x + d , Ax + b ) ∈ K m +1 ⇐ ⇒ . q 10 / 30

  12. Examples of rotated quadratic cones • Squared Euclidean norm: � x � 2 (1 / 2 , t , x ) ∈ K n +2 2 ≤ t ⇐ ⇒ . r • Convex quadratic inequality: (1 / 2) x T Qx ≤ c T x + d (1 / 2 , c T x + d , F T x ) ∈ K k +2 ⇐ ⇒ r with Q = F T F , F ∈ R n × k . 11 / 30

  13. Examples of rotated quadratic cones • Convex hyperbolic function: √ 1 2) ∈ K 3 x ≤ t , x > 0 ⇐ ⇒ ( x , t , r . • Convex negative rational power: √ 1 ( t , 1 2) ∈ K 3 x 2 ≤ t , x > 0 ⇐ ⇒ 2 , s ) , ( x , s , r . • Square roots: √ x ≥ t , x ≥ 0 (1 2 , x , t ) ∈ K 3 ⇐ ⇒ r . • Convex positive rational power: x 3 / 2 ≤ t , x ≥ 0 ( s , t , x ) , ( x , 1 / 8 , s ) ∈ K 3 ⇐ ⇒ r . 12 / 30

  14. Nonsymmetric cones (in next MOSEK release) • the three-dimensional power cone p = { x ∈ R 3 | x α 1 x (1 − α ) K α ≥ | x 3 | , x 1 , x 2 ≥ 0 } , 2 for 0 < α < 1. • the three-dimensional exponential cone K e = cl { x ∈ R 3 | x 1 ≥ x 2 exp( x 3 / x 2 ) , x 2 > 0 } . IPMs for nonsymmetric cones are less studied, and less mature. 13 / 30

  15. Examples of power cones • Models many quadratic cone examples more succinctly. • Powers: ( t , 1 , x ) ∈ K 1 / p t ≥ | x | p ⇐ ⇒ p • p -norm cones ( p > 1): r i = t , ( r i , t , x i ) ∈ K 1 / p � t ≥ � x � p ⇐ ⇒ , i = 1 , . . . , n . p 14 / 30

  16. Examples of exponential cones • Expontial: e x ≤ t ⇐ ⇒ ( t , 1 , x ) ∈ K e . • Logarithm: log x ≥ t ⇐ ⇒ ( x , 1 , t ) ∈ K e . • Entropy: − x log x ≥ t ⇐ ⇒ (1 , x , t ) ∈ K e . • Softplus function: log(1+ e x ) ≤ t ⇐ ⇒ ( u , 1 , x − t ) , ( v , 1 , − t ) ∈ K e , u + v ≤ 1 . • Log-sum-exp: � � e x i ) ≤ t ⇐ log( ⇒ u i ≤ 1 , ( u i , 1 , x i − t ) ∈ K e , i = 1 , . . . , n . i 15 / 30

  17. The homogeneous model for conic problems Solution to the homogenous model Ax − b τ = 0 c τ − A T y − s = 0 c T x − b T y + κ = 0 x ∈ K , s ∈ K ∗ , τ, κ ≥ 0 , encapsulates different duality cases: • If τ > 0, κ = 0 then 1 τ ( x , y , s ) is optimal, c τ − A T y = s , c T x − b T y = 0 . Ax = b τ, • If τ = 0, κ > 0 then the problem is infeasible, − A T y = s , c T x − b T y < 0 . Ax = 0 , • If τ = 0, κ = 0 then the problem is ill-posed. 1 16 / 30

  18. Shifted central-path for cone problems Let F ( · ) be a logarithmic barrier for K . Central-path for interior point ( x 0 , s 0 , y 0 , τ 0 , κ 0 ): Ax µ − b τ µ = µ ( Ax 0 − b τ 0 ) s µ + A T y µ − c τ µ = µ ( s 0 + A T y 0 − c τ 0 ) c T x µ − b T y µ + κ µ = µ ( c T x 0 − b T y 0 + κ 0 ) s µ = − µ F ′ ( x µ ) , x µ = − µ F ′ ∗ ( s µ ) , κ µ τ µ = µ, parametrized in µ . For (our three) symmetric cones, we have a bilinear product ◦ , and the barrier function satisfies F ′ ( x ) = − x − 1 (using the inverse defined by the product), so the centrality condition becomes x ◦ s = µ e . 17 / 30

  19. Non-symmetric cones are more difficult to handle • For the non-symmetric cones, there is no such bilinear product. • The three symmetric cones are also self-scaling , and there exists a Nesterov-Todd scaling Wx = W − 1 s = λ. For the non-symmetric cones, this does not exist. • Higher-order Mehrotra-type correctors are illusive. 18 / 30

  20. A logistic regression example Given n binary training-points { ( x i , y i ) } . Training: � minimize t i + λ r i t i ≥ log(1 + exp( − θ T x i )) , subject to y i = 1 , t i ≥ log(1 + exp( θ T x i )) , y i = 0 , r ≥ � θ � 2 , 2 n exponential cones + 1 quadratic cone. Classifier: 1 h θ ( z ) = 1 + exp( − θ T z ) . 19 / 30

  21. A logistic regression example from mosek.fusion import * # t >= log( 1 + exp(u) ) def softplus(M, t, u): aux = M.variable(2) M.constraint(Expr.sum(aux), Domain.lessThan(1.0)) M.constraint(Expr.hstack(aux, Expr.constTerm(2, 1.0), Expr.vstack(Expr.sub(u,t), Expr.neg(t))), Domain.inPExpCone()) # Model logistic regression (regularized with full 2-norm of theta) # lambda - regularization parameter def logisticRegression(X, y, lamb=1.0): n, d = X.shape # num samples, dimension M = Model() theta = M.variable(d) t = M.variable(n) reg = M.variable() M.objective(ObjectiveSense.Minimize, Expr.add(Expr.sum(t), Expr.mul(lamb,reg))) M.constraint(Var.vstack(reg, theta), Domain.inQCone()) for i in range(n): dot = Expr.dot(X[i], theta) if y[i]==1: softplus(M, t.index(i), Expr.neg(dot)) else: softplus(M, t.index(i), dot) return M, theta 20 / 30

  22. A logistic regression example 1.0 1.0 0.5 0.5 0.0 0.0 0.5 0.5 1.0 1.0 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 Decision regions for different regularizations. Data lifted to the space of degree 6 polynomials. 21 / 30

Recommend


More recommend