Learning when to use a Decomposition Markus Kruber Marco L ubbecke - PowerPoint PPT Presentation

Learning when to use a Decomposition Markus Kruber · Marco L¨ ubbecke · Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 · C.O.W. Aussois · January 9, 2018

Machine Learning is Everywhere @mluebbecke #aussois2018 Learning when to use a Decomposition 2/17 · · ·

Supervised Learning: Classification ◮ data X ( x i ) @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·

Supervised Learning: Classification ◮ data X labels Y ( x i , y i ) @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·

Supervised Learning: Classification ◮ data X , d features, labels Y φ : X → R d ( x i , y i ) ( φ ( x i ) , y i ) @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·

Supervised Learning: Classification ◮ data X , d features, labels Y φ : X → R d ( x i , y i ) ( φ ( x i ) , y i ) algorithm @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·

Supervised Learning: Classification ◮ data X , d features, labels Y φ : X → R d ( x i , y i ) ( φ ( x i ) , y i ) algorithm “learns” f : R d → Y s.t. error x i ∈X ( f ( φ ( x i )) , y i ) “small” @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·

Supervised Learning: Classification ◮ data X , d features, labels Y φ : X → R d ( x i , y i ) ( φ ( x i ) , y i ) algorithm “learns” f : R d → Y s.t. error x i ∈X ( f ( φ ( x i )) , y i ) “small” an optimization problem @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·

Supervised Learning: Classification ◮ data X , d features, labels Y φ : X → R d ( x i , y i ) ( φ ( x i ) , y i ) algorithm “learns” f : R d → Y e s.t. error x i ∈X ( f ( φ ( x i )) , y i ) “small” a t d a l i v an optimization problem @mluebbecke #aussois2018 Learning when to use a Decomposition 3/17 · · ·

Binary Classification: Dog or Muffin? Owl or Apple? @mluebbecke #aussois2018 Learning when to use a Decomposition 4/17 · · ·

SCIP ◮ source-open MIP/MINLP (and much more) solver ◮ also: a branch-price-and-cut framework ◮ scip.zib.de @mluebbecke #aussois2018 Learning when to use a Decomposition 5/17 · · ·

GCG ◮ extension to SCIP ◮ fully generic branch-price-and-cut solver ◮ automatically applies Dantzig-Wolfe reformulation to a MIP ◮ www.or.rwth-aachen.de/gcg @mluebbecke #aussois2018 Learning when to use a Decomposition 6/17 · · ·

GCG automatically detects Structure, a lot! ◮ up to a few hundred or thousand decompositions per MIP ◮ GCG performance highly depends on whether MIP structure is reflected by some decomposition (and we find/select it) @mluebbecke #aussois2018 Learning when to use a Decomposition 7/17 · · ·

Automatic Reformulation in GCG decomposition types border, staircase, . . . GCG internal score DEC D 1 DEC D 2 MIP P Detection Select GCG . . . DEC D k @mluebbecke #aussois2018 Learning when to use a Decomposition 8/17 · · ·

Automatic Reformulation in GCG decomposition types border, staircase, . . . GCG internal score SCIP no DEC D 1 yes DEC D 2 MIP P Detection DWR? Select GCG . . . DEC D k learn learn this work: a supervised learning approach to select a best decomposition (or decide not to use a decomposition at all) @mluebbecke #aussois2018 Learning when to use a Decomposition 8/17 · · ·

Supervised Learning Approach remember: given data X , a classifier f predicts a label Y Y = f ( X ) ◮ f is a binary classifier if Y ∈ { 0 , 1 } ◮ learn a classifier: find an f θ that fits best a training set (( x i , y i ) , i = 1 , . . . , n ) among a family ( f θ , θ ∈ Θ) ◮ we use standard algorithms implemented in scikit-learn data X : labels Y : ◮ MIP P ◮ use SCIP or GCG ? ◮ decomposition(s) D ◮ which decomposition? @mluebbecke #aussois2018 Learning when to use a Decomposition 9/17 · · ·

Feature Map φ output input feature features vector classifier φ ( P , D ) ∈ R d ( P , D ) f θ ( φ ( P , D )) map φ f θ choose family Θ define φ 80+ features learn f θ examples of features we used ◮ # linking vars/conss “classics” ◮ # variables/constraints ◮ # blocks ◮ variable types ◮ min , max , ⊘ block size ◮ constraint types ◮ detector used (indicator) ◮ products of features ◮ detection quality metrics @mluebbecke #aussois2018 Learning when to use a Decomposition 10/17 · · ·

Labeling: Definition of Y question: should we use SCIP or GCG ? training set ◮ MIP P ◮ SCIP run on each P ◮ decompositions D ◮ GCG run on each ( P , D ) given an input ( P , D ) we learn a binary classifier Y = f ( φ ( P , D )) where Y = 1 iff GCG on ( P , D ) is better than SCIP on P after a given timelimit: ◮ GCG solves P and SCIP doesn’t ◮ both solve P and GCG is faster ◮ neither solve P but GCG ’s gap is smaller @mluebbecke #aussois2018 Learning when to use a Decomposition 11/17 · · ·

Regression: Predict Decomposition Quality question: if any, which decomposition should we use? f ( φ ( P , D )) ∈ [0 , 1] : probability that GCG with D beats SCIP . given decompositions D 1 , . . . , D k and remaining time t , use GCG if max f ( φ ( P , D i )) ≥ α i we use 0 . 5 < α ≤ 1 : decomposition is not a default choice. if we use GCG , select decomposition arg max f ( φ ( P , D i )) i @mluebbecke #aussois2018 Learning when to use a Decomposition 12/17 · · ·

400 MIP Instances structured non-str SCIP results all clr stcv cpmpsdlb ctst gapntlb ltsz bp rap stbl cvrp miplib instances 400 25 25 25 25 25 25 25 25 25 25 25 25 100 opt. sol. 65.5% 19 3 18 10 25 23 25 25 6 12 22 6 68 feas. sol. 31.5% 6 21 7 11 - 2 - - 19 12 3 19 26 no sol. 3.0% - 1 - 4 - - - - - 1 - - 6 structured instances coloring (clr) network design (ntlb) set covering (stcv) resource allocation (rap) capacitated p -median (cpmp) capacitated vehicle routing (cvrp) survivable network design (sdlb) lot sizing (ltsz) cutting stock (ctst) bin packing (bp) generalized assignment (gap) stable set (stbl) @mluebbecke #aussois2018 Learning when to use a Decomposition 13/17 · · ·

Overall Performance on Training Data ◮ testset of 131 MIP instances, 99 structured, 32 unstructured ◮ GCG better than SCIP on 34 instances Instances All Structured Non-structured Solver us opt SCIP GCG us opt SCIP GCG us opt SCIP GCG No opt. sol. 52 66 44 39 39 37 31 26 13 29 14 13 CPU time (h) 111.3 142.6 93.1 85.7 83.5 82.2 65.9 58.5 27.8 56.8 29.2 27.2 Geo. mean (s) 127.1 370.4 78.6 67.8 73.4 146.9 39.2 32.2 672.9 5145.0 766.0 646.5 ◮ SCIP : apply default SCIP to all instances ◮ GCG : apply default GCG to all instances ◮ us: our supervised learning scheme ◮ opt: best decomposition selected each time @mluebbecke #aussois2018 Learning when to use a Decomposition 14/17 · · ·

Accuracy: How often do we predict the right Solver? avoid using GCG when we do not find an appropriate structure is GCG on ( P , D ) better than SCIP on P ? All Structured Non-struct. SCIP GCG SCIP GCG SCIP GCG Pred. 74.0% 26.0% 68.7% 31.3% 90.6% 9.4% SCIP TN FN 7 GCG FP TP 7 @mluebbecke #aussois2018 Learning when to use a Decomposition 15/17 · · ·

Accuracy: How often do we predict the right Solver? avoid using GCG when we do not find an appropriate structure is GCG on ( P , D ) better than SCIP on P ? All Structured Non-struct. SCIP GCG SCIP GCG SCIP GCG Pred. 74.0% 26.0% 68.7% 31.3% 90.6% 9.4% SCIP 69.5% 12.3% 64.6% 11.1% 84.4% 6.3% GCG 4.5% 13.7% 4.1% 20.2% 6.3% 3.1% @mluebbecke #aussois2018 Learning when to use a Decomposition 15/17 · · ·

Take-Away ◮ ML 1 helps us deciding whether to use B&C or B&P ◮ don’t ask for reasons , this is ML 1 this is not my name @mluebbecke #aussois2018 Learning when to use a Decomposition 16/17 · · ·

Learning when to use a Decomposition Markus Kruber · Marco L¨ ubbecke · Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 · C.O.W. Aussois · January 9, 2018 @mluebbecke #aussois2018 Learning when to use a Decomposition 17/17 · · ·

Learning when to use a Decomposition Markus Kruber Marco L ubbecke - PowerPoint PPT Presentation

Learning when to use a Decomposition Markus Kruber Marco L ubbecke Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 C.O.W. Aussois January 9, 2018 Machine Learning is Everywhere

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

CORE DECOMPOSITION AND DENSEST SUBGRAPH IN MULTILAYER NETWORKS CORE DECOMPOSITION AND DENSEST

For personal use only For personal use only For personal use only For personal use only For

Simple, Efficient, Portable Decomposition of Simple, Efficient, Portable Decomposition of Large

DECOMPOSITION OF METAL SULFATES DECOMPOSITION OF METAL SULFATES A SO 2 -SOURCE FOR SOURCE FOR A

European energy equilibrium and decomposition Anes Dallagi EDF R&D OSIRIS

Investigation of Thermal Decomposition Investigation of Thermal Decomposition Process of

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

The Well-Separated Pair Decomposition The Well-Separated M. Farshi Pair Decomposition Lab. of

Decomposition rank of UHF-absorbing C -algebras Joint work with Hiroki Matui 12, Mar., 2013.

Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju Wang, University of Florida

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of Technology AAAI

Better Together Martin Bravenboer LogicBlox Yannis Smaragdakis UMass Amherst ISSTA 2009

Enabling large scale LAPW DFT calculations by a scalable iterative eigensolver CSE15, Salt Lake

Evaluating compositionality in sentences embeddings Ishita Dasgupta Harvard University,

Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research 2 : muffin tops?

Two (too?) big assumptions Recollecting Haskell, Part I (Based on Chapters 1 and 2 of LYH )

Sugar Brick: The Manual Briquette Press 2.009 Blue Team B October 21, 2004 Contents

Asyncio community one year later EuroPython 2015, Bilbao Victor Stinner vstinner@redhat.com

Mul$ channel mul$ple sca.ering theory for X-ray absorp$on

Learning when to use a Decomposition Markus Kruber Marco L ubbecke - PowerPoint PPT Presentation

Learning when to use a Decomposition Markus Kruber Marco L ubbecke Axel Parmentier Chair of Operations Research RWTH Aachen University, Germany @mluebbecke #aussois2018 C.O.W. Aussois January 9, 2018 Machine Learning is Everywhere

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

CORE DECOMPOSITION AND DENSEST SUBGRAPH IN MULTILAYER NETWORKS CORE DECOMPOSITION AND DENSEST

For personal use only For personal use only For personal use only For personal use only For

Simple, Efficient, Portable Decomposition of Simple, Efficient, Portable Decomposition of Large

DECOMPOSITION OF METAL SULFATES DECOMPOSITION OF METAL SULFATES A SO 2 -SOURCE FOR SOURCE FOR A

European energy equilibrium and decomposition Anes Dallagi EDF R&amp;D OSIRIS

Investigation of Thermal Decomposition Investigation of Thermal Decomposition Process of

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

The Well-Separated Pair Decomposition The Well-Separated M. Farshi Pair Decomposition Lab. of

Decomposition rank of UHF-absorbing C -algebras Joint work with Hiroki Matui 12, Mar., 2013.

Towards Low Partition Overhead in Image Decomposition SCI (2003) Ju Wang, University of Florida

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

Dual Decomposition for Marginal Inference Justin Domke Rochester Institute of Technology AAAI

Better Together Martin Bravenboer LogicBlox Yannis Smaragdakis UMass Amherst ISSTA 2009

Enabling large scale LAPW DFT calculations by a scalable iterative eigensolver CSE15, Salt Lake

Evaluating compositionality in sentences embeddings Ishita Dasgupta Harvard University,

Universally Adaptive Data Analysis Cynthia Dwork, Microsoft Research 2 : muffin tops?

Two (too?) big assumptions Recollecting Haskell, Part I (Based on Chapters 1 and 2 of LYH )

Sugar Brick: The Manual Briquette Press 2.009 Blue Team B October 21, 2004 Contents

Asyncio community one year later EuroPython 2015, Bilbao Victor Stinner vstinner@redhat.com

Mul$ channel mul$ple sca.ering theory for X-ray absorp$on

European energy equilibrium and decomposition Anes Dallagi EDF R&D OSIRIS