Learning when to use a Decomposition Markus Kruber · Marco L¨ ubbecke · Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 · C.O.W. Aussois · January 9, 2018
Machine Learning is Everywhere @mluebbecke #aussois2018 Learning when to use a Decomposition 2/17 · · ·
Supervised Learning: Classification ◮ data X ( x i ) @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·
Supervised Learning: Classification ◮ data X labels Y ( x i , y i ) @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·
Supervised Learning: Classification ◮ data X , d features, labels Y φ : X → R d ( x i , y i ) ( φ ( x i ) , y i ) @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·
Supervised Learning: Classification ◮ data X , d features, labels Y φ : X → R d ( x i , y i ) ( φ ( x i ) , y i ) algorithm @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·
Supervised Learning: Classification ◮ data X , d features, labels Y φ : X → R d ( x i , y i ) ( φ ( x i ) , y i ) algorithm “learns” f : R d → Y s.t. error x i ∈X ( f ( φ ( x i )) , y i ) “small” @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·
Supervised Learning: Classification ◮ data X , d features, labels Y φ : X → R d ( x i , y i ) ( φ ( x i ) , y i ) algorithm “learns” f : R d → Y s.t. error x i ∈X ( f ( φ ( x i )) , y i ) “small” an optimization problem @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·
Supervised Learning: Classification ◮ data X , d features, labels Y φ : X → R d ( x i , y i ) ( φ ( x i ) , y i ) algorithm “learns” f : R d → Y e s.t. error x i ∈X ( f ( φ ( x i )) , y i ) “small” a t d a l i v an optimization problem @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·
Binary Classification: Dog or Muffin? Owl or Apple? @mluebbecke #aussois2018 Learning when to use a Decomposition 4/17 · · ·
SCIP ◮ source-open MIP/MINLP (and much more) solver ◮ also: a branch-price-and-cut framework ◮ scip.zib.de @mluebbecke #aussois2018 Learning when to use a Decomposition 5/17 · · ·
GCG ◮ extension to SCIP ◮ fully generic branch-price-and-cut solver ◮ automatically applies Dantzig-Wolfe reformulation to a MIP ◮ www.or.rwth-aachen.de/gcg @mluebbecke #aussois2018 Learning when to use a Decomposition 6/17 · · ·
GCG automatically detects Structure, a lot! ◮ up to a few hundred or thousand decompositions per MIP ◮ GCG performance highly depends on whether MIP structure is reflected by some decomposition (and we find/select it) @mluebbecke #aussois2018 Learning when to use a Decomposition 7/17 · · ·
Automatic Reformulation in GCG decomposition types border, staircase, . . . GCG internal score DEC D 1 DEC D 2 MIP P Detection Select GCG . . . DEC D k @mluebbecke #aussois2018 Learning when to use a Decomposition 8/17 · · ·
Automatic Reformulation in GCG decomposition types border, staircase, . . . GCG internal score SCIP no DEC D 1 yes DEC D 2 MIP P Detection DWR? Select GCG . . . DEC D k learn learn this work: a supervised learning approach to select a best decomposition (or decide not to use a decomposition at all) @mluebbecke #aussois2018 Learning when to use a Decomposition 8/17 · · ·
Supervised Learning Approach remember: given data X , a classifier f predicts a label Y Y = f ( X ) ◮ f is a binary classifier if Y ∈ { 0 , 1 } ◮ learn a classifier: find an f θ that fits best a training set (( x i , y i ) , i = 1 , . . . , n ) among a family ( f θ , θ ∈ Θ) ◮ we use standard algorithms implemented in scikit-learn data X : labels Y : ◮ MIP P ◮ use SCIP or GCG ? ◮ decomposition(s) D ◮ which decomposition? @mluebbecke #aussois2018 Learning when to use a Decomposition 9/17 · · ·
Feature Map φ output input feature features vector classifier φ ( P , D ) ∈ R d ( P , D ) f θ ( φ ( P , D )) map φ f θ choose family Θ define φ 80+ features learn f θ examples of features we used ◮ # linking vars/conss “classics” ◮ # variables/constraints ◮ # blocks ◮ variable types ◮ min , max , ⊘ block size ◮ constraint types ◮ detector used (indicator) ◮ products of features ◮ detection quality metrics @mluebbecke #aussois2018 Learning when to use a Decomposition 10/17 · · ·
Labeling: Definition of Y question: should we use SCIP or GCG ? training set ◮ MIP P ◮ SCIP run on each P ◮ decompositions D ◮ GCG run on each ( P , D ) given an input ( P , D ) we learn a binary classifier Y = f ( φ ( P , D )) where Y = 1 iff GCG on ( P , D ) is better than SCIP on P after a given timelimit: ◮ GCG solves P and SCIP doesn’t ◮ both solve P and GCG is faster ◮ neither solve P but GCG ’s gap is smaller @mluebbecke #aussois2018 Learning when to use a Decomposition 11/17 · · ·
Regression: Predict Decomposition Quality question: if any, which decomposition should we use? f ( φ ( P , D )) ∈ [0 , 1] : probability that GCG with D beats SCIP . given decompositions D 1 , . . . , D k and remaining time t , use GCG if max f ( φ ( P , D i )) ≥ α i we use 0 . 5 < α ≤ 1 : decomposition is not a default choice. if we use GCG , select decomposition arg max f ( φ ( P , D i )) i @mluebbecke #aussois2018 Learning when to use a Decomposition 12/17 · · ·
400 MIP Instances structured non-str SCIP results all clr stcv cpmpsdlb ctst gapntlb ltsz bp rap stbl cvrp miplib instances 400 25 25 25 25 25 25 25 25 25 25 25 25 100 opt. sol. 65.5% 19 3 18 10 25 23 25 25 6 12 22 6 68 feas. sol. 31.5% 6 21 7 11 - 2 - - 19 12 3 19 26 no sol. 3.0% - 1 - 4 - - - - - 1 - - 6 structured instances coloring (clr) network design (ntlb) set covering (stcv) resource allocation (rap) capacitated p -median (cpmp) capacitated vehicle routing (cvrp) survivable network design (sdlb) lot sizing (ltsz) cutting stock (ctst) bin packing (bp) generalized assignment (gap) stable set (stbl) @mluebbecke #aussois2018 Learning when to use a Decomposition 13/17 · · ·
Overall Performance on Training Data ◮ testset of 131 MIP instances, 99 structured, 32 unstructured ◮ GCG better than SCIP on 34 instances Instances All Structured Non-structured Solver us opt SCIP GCG us opt SCIP GCG us opt SCIP GCG No opt. sol. 52 66 44 39 39 37 31 26 13 29 14 13 CPU time (h) 111.3 142.6 93.1 85.7 83.5 82.2 65.9 58.5 27.8 56.8 29.2 27.2 Geo. mean (s) 127.1 370.4 78.6 67.8 73.4 146.9 39.2 32.2 672.9 5145.0 766.0 646.5 ◮ SCIP : apply default SCIP to all instances ◮ GCG : apply default GCG to all instances ◮ us: our supervised learning scheme ◮ opt: best decomposition selected each time @mluebbecke #aussois2018 Learning when to use a Decomposition 14/17 · · ·
Accuracy: How often do we predict the right Solver? avoid using GCG when we do not find an appropriate structure is GCG on ( P , D ) better than SCIP on P ? All Structured Non-struct. SCIP GCG SCIP GCG SCIP GCG Pred. 74.0% 26.0% 68.7% 31.3% 90.6% 9.4% SCIP TN FN 7 GCG FP TP 7 @mluebbecke #aussois2018 Learning when to use a Decomposition 15/17 · · ·
Accuracy: How often do we predict the right Solver? avoid using GCG when we do not find an appropriate structure is GCG on ( P , D ) better than SCIP on P ? All Structured Non-struct. SCIP GCG SCIP GCG SCIP GCG Pred. 74.0% 26.0% 68.7% 31.3% 90.6% 9.4% SCIP 69.5% 12.3% 64.6% 11.1% 84.4% 6.3% GCG 4.5% 13.7% 4.1% 20.2% 6.3% 3.1% @mluebbecke #aussois2018 Learning when to use a Decomposition 15/17 · · ·
Take-Away ◮ ML 1 helps us deciding whether to use B&C or B&P ◮ don’t ask for reasons , this is ML 1 this is not my name @mluebbecke #aussois2018 Learning when to use a Decomposition 16/17 · · ·
Learning when to use a Decomposition Markus Kruber · Marco L¨ ubbecke · Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 · C.O.W. Aussois · January 9, 2018 @mluebbecke #aussois2018 Learning when to use a Decomposition 17/17 · · ·
Recommend
More recommend