Improving Optimization Bounds using Machine Learning: Decision - PowerPoint PPT Presentation

Improving Optimization Bounds using Machine Learning: Decision Diagrams meet Deep Reinforcement Learning Quentin Cappart , Emmanuel Goutierre, David Bergman, Louis-Martin Rousseau � 1

Research question Bounding mechanisms are critical in the design of scalable optimization solvers. Inflexible bounds Flexible bounds Linear relaxation Relaxed/Restricted decision diagrams • Maximum width. • Node merging. • Variable ordering. � 2

Running Example: Maximum Independent Set Problem Given a graph, select the set of non adjacent vertices with the maximum weight. x 2 x 4 x 2 x 4 x 2 x 4 4 2 4 2 4 2 x 1 x 3 x 5 x 1 x 3 x 5 x 1 x 3 x 5 3 2 7 3 2 7 3 2 7 Weight = 11 Instance Weight = 5 (Optimal) � 3

Encoding MISP using decision diagrams x 2 x 4 {1,2,3,4,5} x 1 3 4 2 {2,3,4,5} {4,5} x 2 x 1 x 3 x 5 4 {3,4,5} 3 2 7 {5} {4,5} 2 x 3 {5} {4,5} x 4 2 1. Node state : vertices that can be inserted. {5} 7 x 5 2. Arc cost : weight of the node, if inserted. 3. Solution : longest path in the diagram. Solution = 4 + 7 = 11 � 4

Flexible bounds using decision diagrams (1/2) Exact DD Relaxed DD Restricted DD x 1 3 3 3 Delete Merge nodes nodes 4 x 2 4 2 x 3 2 2 x 4 2 2 2 7 7 7 x 5 2 + 7 = 9 4 + 7 = 11 4 + 2 + 7 = 13 Lower bound Optimal solution Upper bound 9 11 13 � 5

Flexible bounds using decision diagrams (2/2) Exact DD Relaxed DD Restricted DD x 2 4 4 4 Delete Merge x 3 nodes nodes 2 2 x 1 2 2 2 x 5 7 7 7 7 7 7 x 4 3 3 3 4 + 7 = 11 4 + 7 = 11 2 + 7 + 3 = 12 Optimal solution 9 11 12 13 � 6

Improving a variable ordering is NP-hard Variable ordering can have a huge impact on the bounds obtained. But improving the variable ordering is NP-hard... We propose a generic method based on Deep Reinforcement Learning. � 7

Reinforcement learning in a nutshell (1/2) Action Agent Environment State Reward 1. The agent observes the environment . 2. He chooses an action . The goal is to maximize the sum of received rewards until a terminal state is reached. 3. He gets a reward from it. 4. He moves to another state . � 8

Reinforcement learning in a nutshell (2/2) Maximize the total reward. How do we select the actions to do ? State 0 … In theory... Action Action 1. Compute an estimation of the quality of actions: Q-values . Reward Reward State 1 State 2 2. Take the action having the best Q-value: g reedy policy . … … 3. The policy is optimal if the Q-values are optimal. … … … 3 State 1 In practice... … 1. Search space to large to compute the optimal Q-values. … … Q-learning : iteratively update the Q-values through simulations. 2. Some states are never visited through the simulations. Terminal states Deep Q-learning : approximate similar states using a deep network. � 9

Reinforcement learning vs decision diagrams Reinforcement Learning Decision Diagrams State Space State Space Action Variable Selection Reward function Cost function Transition function Transition function Merging operation There is a natural similarity ! (Both are based on dynamic programming) � 10

RL environment for decision diagrams 1. An ordered list of variables. State 2. The DD currently built. Action Add a new variable in the DD. Built the next layer of the DD Transition using the selected variable. Improvement in the new Reward lower/upper bound (di ff erence in the longest path). For any COP that can be recursively encoded by a decision diagram. � 11

Construction of the DD using RL Sequence of states Environment Reward Current relaxed DD • State 1: 0 [] 4 2 Q ( x 2 ) = 6 Q ( x 4 ) = 1 LP = 0 • Action: Inserting + -4 x 2 Q ( x 1 ) = 2 3 2 7 Q ( x 5 ) = 3 Q ( x 4 ) = 5 x 2 4 • State 2: = -4 [ x 2 ] 4 2 Q ( x 4 ) = 2 LP = 4 • Action: Inserting + 0 x 3 3 2 7 Q ( x 1 ) = 1 Q ( x 5 ) = 6 Q ( x 3 ) = 9 x 3 2 • State 3: = -4 [ x 2 , x 3 ] 4 2 Q ( x 4 ) = 1 LP = 4 • Action: Inserting + 0 x 1 3 2 7 2 Q ( x 1 ) = 3 Q ( x 5 ) = 1 x 1 • State 4: = -4 [ x 2 , x 3 , x 1 ] 4 2 Q ( x 4 ) = 2 LP = 4 • Action: Inserting + -7 x 5 3 2 7 Q ( x 5 ) = 3 7 7 x 5 • State 5: = -11 [ x 2 , x 3 , x 1 , x 5 ] 4 2 Q ( x 4 ) = 8 LP = 11 3 • Action: Inserting + -1 x 4 3 2 7 x 4 • State 6: (Terminal state) = -12 [ x 2 , x 3 , x 1 , x 5 , x 4 ] LP = 12 � 12

̂ ̂ ̂ Computing the Q-values Q ( State , Action ) ≈ Q ( State , Action , Weight ) Training phase: parametrizing the weight … , Weight ) = Q ( … ... Evaluation: compute the estimated Q-value , Weight ) = 8 Q ( � 13

Training the model 1. Experiments on the unweighted Maximum Independent Set Problem . m = 1 2. Barabasi-Albert model : real-world and scale-free graphs. 3. Density known by fixing the attachment parameter. 4. Graphs between 90 and 100 nodes . m = 2 5. Maximal width for training is 2 . 6. 5000 randomly generated BA graphs and periodically refreshed . 7. Independent models for relaxed and restricted DDs. Main assumption: the nature of the graphs we want to access is known. � 14

Experimental setup 1. Comparison with common heuristics (random, MPD, min-in-state and vertex-degree ). 2. Comparison with linear relaxation (only with relaxed DDs). 3. Width of 100 for relaxed DDs and width of 2 for restricted DDs. 4. Graphs between 90 and 100 nodes . 5. Di ff erent configurations for the attachment parameter ( 2 , 4 , 8 and 16 ). 6. Tested on 100 new random graphs . 7. Compared with the optimality gap using performance profiles . Other configurations are then tested. � 15

Experiments for relaxed DDs (width = 100) m = 2 m = 4 m = 8 m = 16 RL is the best ordering and is better than LP for denser graphs. � 16

Experiments for restricted DDs (width = 2) m = 2 m = 4 m = 8 m = 16 RL gives the best ordering in almost all situations. � 17

Increasing the width for relaxed DDs Training still done with a width of 2. The model is robust when the width increases and the execution time remains acceptable. � 18

Conclusion and perspectives Machine Combinatorial Learning Optimization Decision Diagrams Contributions and results: 1. A generic approach based on DDs for learning flexible bounds. 2. Better performances than classical approaches on the MISP. 3. Robust approach for larger graphs and width. Perspectives and future work: 1. Data augmentation for real-life instances. 2. Application to other problems . 3. Improvement using other algorithms or approximators. 4. Application to other fields (constraint programming, planning, etc.) � 19

Improving Optimization Bounds using Machine Learning quentin.cappart@polymtl.ca arxiv.org/abs/1809.03359 <To replace with the AAAI link> github.com/qcappart/learning-DD � 20

Increasing the graph size (width = 100) Training still done with graphs of 90 to 100 nodes. Relaxed DDs Restricted DDs Fairly robust. Strongly robust. � 21

Modifying the distribution (width = 100) Training done with an attachment parameter of 4. Relaxed DDs Restricted DDs Important to know the distribution of the graphs we want to access. � 22

Impact of the width used during training Testing width = 2 Testing width = 10 Testing width = 50 Testing width = 100 Ordering independent of the width chosen during the training. � 23

Application to Maxcut problem (work in progress) Given a graph, select a set of nodes such that the weighted cut with the set of non selected nodes is maximized. Relaxed DDs (width = 100) Restricted DDs (width = 2) Promising results but more di ffi cult than the MISP. � 24

Improving Optimization Bounds using Machine Learning: Decision - PowerPoint PPT Presentation

Improving Optimization Bounds using Machine Learning: Decision Diagrams meet Deep Reinforcement Learning Quentin Cappart , Emmanuel Goutierre, David Bergman, Louis-Martin Rousseau 1 Research question Bounding mechanisms are critical in the

Circuit Lower-bounds Lecture 24 Weak circuits are indeed weak 1 Circuit Lower-bounds 2

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Machine Learning for Auto Optimization What is Machine Learning? Definition: Machine

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

PAC-learning, VC Dimension and Margin-based Bounds Machine Learning 10701/15781 Carlos

Post hoc bounds on false positives using Post hoc bounds on false positives using reference

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Randomness in Computing L ECTURE 10 Last time Chernoff Bounds Today Hoeffding Bounds

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Local Function Optimization COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

The Use of Phonics and Vocabulary Building Strategies in Teaching Non-language Subjects 10 May

On Moving Mountains: Lessons for Migrating Data and Content for your Next Drupal Project

Enlightening Tourism 1st International Conference Competition and Innovation in Tourism: New

State Offices of Rural Health (SORH) FY 19 NCC Instructions Webinar January 16, 2019 at 2PM EST

User Guide CofaNet V 2.21 November 22nd 2015 1. INTRODUCTION

Step by step guide Step 1: Purchasing a RSTickets!Pro membership Step 2: Downloading

Why Phishing Works Rachna Dhamija, J.D. Tygar, Marti Hearst Presented By: Vince Zanella

The Rise (and Fall?) of (De)Centralized Automatic Contact Tracing Gennaro Avitabile, Vincenzo