learning a classification of mixed integer quadratic
play

Learning a classification of Mixed-Integer Quadratic Programming - PowerPoint PPT Presentation

Learning a classification of Mixed-Integer Quadratic Programming problems 22 nd Combinatorial Optimization Workshop Aussois January 12, 2018 Pierre Bonami 1 , Andrea Lodi 2 , Giulia Zarpellon 2 1 CPLEX Optimization, IBM Spain 2 Polytechnique


  1. Learning a classification of Mixed-Integer Quadratic Programming problems 22 nd Combinatorial Optimization Workshop · Aussois · January 12, 2018 Pierre Bonami 1 , Andrea Lodi 2 , Giulia Zarpellon 2 1 CPLEX Optimization, IBM Spain 2 Polytechnique Montr´ eal, GERAD, CERC Data Science for real-time Decision Making

  2. https://www.gerad.ca/en/papers/G-2017-106

  3. Table of contents 1. A classification question on MIQPs 2. Data and experiments 3. Two working questions 1

  4. A classification question on MIQPs

  5. Mixed-Integer Quadratic Programming problems We consider Mixed-Integer Quadratic Programming (MIQP) pbs. 1 2 x T Qx + c T x min Ax = b (1) x i ∈ { 0 , 1 } ∀ i ∈ I l ≤ x ≤ u • Modeling of practical applications • First extension of linear algorithms into nonlinear ones We say an MIQP is convex (resp. nonconvex ) if and only if the matrix Q is positive semi-definite, Q � 0 (resp. indefinite, Q � 0). IBM-CPLEX solver can solve both convex and nonconvex MIQPs to proven optimality 2

  6. Solving MIQPs with CPLEX Convex 0-1 Convex mixed NLP B&B NLP B&B Nonconvex 0-1 Nonconvex mixed convexify + NLP B&B Convexification is linearize + MILP B&B relaxation - Spatial B&B Convexification : augment diagonal of Q , using x i = x 2 i for x i ∈ { 0 , 1 } : x T Qx → x T ( Q + ρ I n ) x − ρ e T x , where Q + ρ I n � 0 for some ρ > 0 Linearization : replace q ij x i x j where x i ∈ { 0 , 1 } and l j ≤ x j ≤ u j with a new variable y ij and McCormick inequalities Linearization is always full in 0-1 case 3

  7. Solving MIQPs with CPLEX Convex 0-1 Convex mixed NL: NLP B&B NL: NLP B&B L: linearize + MILP B&B L: linearize + MILP B&B linearize + NLP B&B Nonconvex 0-1 NL: convexify + NLP B&B L: linearize + MILP B&B Convexification : augment diagonal of Q , using x i = x 2 i for x i ∈ { 0 , 1 } : x T Qx → x T ( Q + ρ I n ) x − ρ e T x , where Q + ρ I n � 0 for some ρ > 0 Linearization : replace q ij x i x j where x i ∈ { 0 , 1 } and l j ≤ x j ≤ u j with a new variable y ij and McCormick inequalities Linearization is always full in 0-1 MIQP (may not for mixed ones) 4

  8. Linearize vs. not linearize The linearization approach seems beneficial also for the convex case, but is linearizing always the best choice? Goal Exploit ML predictive machinery to understand whether it is favorable to linearize the quadratic part of an MIQP or not. • Learn an offline classifier predicting the most suited resolution approach within CPLEX framework, in an instance-specific way qtolin parameter controls the linearization switch • Gain methodological insights about which features of the MIQPs most affect the prediction 5

  9. Data and experiments

  10. Steps to apply supervised learning Dataset generation • Generator of MIQPs, spanning over various parameters • Q = sprandsym(size, density, eigenvalues) Features design • Static features (21) · Mathematical characteristics (variables, constraints, objective, spectrum, . . . ) • Dynamic features (2) · Early behavior in optimization process (bounds and times at root node) Labeling procedure • Consider tie cases · Labels in { L , NL , T } • 1h, 5 seeds · Solvability and consistency checks • Look at runtimes to assign a label { ( x k , y k ) } k =1 .. N where x k ∈ R d , y k ∈ { L , NL , T } for N MIQPs 6

  11. Dataset D (nutshell) analysis • 2300 instances, n ∈ { 25 , 50 , . . . , 200 } , density d ∈ { 0 . 2 , 0 . 4 , . . . , 1 } • Multiclass classifiers : SVM and Decision Tree based methods (Random Forests (RF) · Extremely Randomized Trees (EXT) · Gradient Tree Boosting (GTB)) • Avoid overfitting with ML best practices • Tool : scikit-learn library 7

  12. Learning results on D test Classifiers perform well with respect to traditional classification measures : D test - Multiclass - All features SVM RF EXT GTB Accuracy 0 . 85 0 . 89 0 . 84 0 . 87 Precision 0 . 82 0 . 85 0 . 81 0 . 85 Recall 0 . 85 0 . 89 0 . 84 0 . 87 F1 score 0 . 83 0 . 87 0 . 82 0 . 86 • A major difficulty is posed by the T class, (almost) always misclassified Binary setting : remove all tie cases · performance improved How relevant are ties with respect to the question L vs. NL? 8

  13. Hints of features importance Ensemble methods based on Decision Trees provide an importance score for each feature. Top scoring ones are dynamic ft.s and those about eigenvalues: (dyn. ft.) • Difference of lower bounds for L and NL at root node (dyn. ft.) • Difference of resolution times of the root node, for L and NL • Value of smallest nonzero eigenvalue • Spectral norm of Q , i.e., � Q � = max i | λ i | • . . . Static features setting : remove dynamic features · performance slightly deteriorated How does the prediction change without information at root node? 9

  14. Complementary optimization measures Need : evaluate classifiers’ performance in optimization terms, and quantify the gain with respect to CPLEX default ( DEF ). • For each test example, select runtime corresponding to the predicted label to build a times vector t clf for each classifier clf and DEF σ clf Sum of predicted runtimes : sum over times in t clf N σ clf Normalized time score ∈ [0 , 1]: shifted geometric mean of times in t clf , normalized between best and worst cases SVM RF EXT GTB DEF σ DEF /σ clf 3 . 88 4 . 40 4 . 04 4 . 26 − N σ clf 0 . 98 0 . 99 0 . 98 0 . 99 0 . 42 10

  15. Two working questions

  16. What about other datasets? • Selection from QPLIB · 24 instances • Part of CPLEX internal testbed · 175 instances 11

  17. What about other datasets? • Selection from QPLIB · 24 instances • Part of CPLEX internal testbed · 175 instances Poor classification , but good optimization measures : SVM RF EXT GTB DEF σ DEF /σ clf 0 . 48 0 . 53 0 . 71 0 . 42 − N σ clf 0 . 75 0 . 90 0 . 91 0 . 74 0.96 12

  18. Why those predictions? Convexification and linearization clearly affect • formulation size • formulation strength • implementation efficacy Each problem type might have its own decision function for the question L vs. NL , depending on, e.g., • | λ min | , . . . when convexifying • # nonzero products between continuous variables in Q , . . . when linearizing mixed instances • matrix conditioning and implementations, . . . ML could also provide deeper insights 13

  19. Thanks! Questions? 13

  20. Minimal references Bliek C, Bonami P, Lodi A (2014) Solving mixed-integer quadratic programming problems with IBM-CPLEX: a progress report. Bonami P, Kilinc ¸ M, Linderoth J (2012) Algorithms and software for convex mixed integer nonlinear programs. Fourer R. Quadratic Optimization Mysteries http://bob4er.blogspot.com/2015/03/quadratic-optimization- mysteries-part-1.html Bishop CM (2006) Pattern Recognition and Machine Learning.

Recommend


More recommend