lifelong optimisation fa2386 12 1 4056
play

Lifelong Optimisation (FA2386-12-1-4056) PI: Peter Stuckey, Pascal - PowerPoint PPT Presentation

Lifelong Optimisation (FA2386-12-1-4056) PI: Peter Stuckey, Pascal Van Hentenryck (NICTA, Melbourne), Toby Walsh (NICTA, Sydney) Senior Personnel: Dr Andreas Schutt (Melbourne), Dr Nina Narodytska (Melbourne) AFOSR Program Review: Mathematical


  1. Lifelong Optimisation (FA2386-12-1-4056) PI: Peter Stuckey, Pascal Van Hentenryck (NICTA, Melbourne), Toby Walsh (NICTA, Sydney) Senior Personnel: Dr Andreas Schutt (Melbourne), Dr Nina Narodytska (Melbourne) AFOSR Program Review: Mathematical and Computational Cognition Program Computational and Machine Intelligence Program Robust Decision Making in Human-System Interface Program (Jan 28 – Feb 1, 2013, Washington, DC)

  2. Lifelong Optimisation (Van Hentenryck, Stuckey, Walsh) Research Objectives: Technical Approach: Treat optimisation not as an one-off 1. Develop optimisation that learn process, but as an ongoing one, so from past problems (e.g. learn that decision support tools actually useful constraints, improve heuristics, etc). improve in their performance with time 2.Develop methods to explain solutions (so we learn to trust solvers over time) Budget ($k): DoD Benefits: Robust optimisation methods that, 2012 2014 like humans, cope well with (option) uncertainty, explain their answers and get better with repeated use. $291,200 $145,600 Potential applications in logistics, Project Start Date: March 2012 and related problems. Project End Date: March 2015 2

  3. Project Goals 1. Learning from past problems 1. Learning constraints (aka “nogood” learning) 2. Learning heuristics 3. Robust solving 2. Explaining solutions 1. Explaining optimality 2. Explaining unsatisfiability (aka infeasibility) 3

  4. Progress Towards Goals Learning problem constraints Technical challenge : how do we cope with the new data to today’s problem instance? Solution : learn parameterized constraints that abstract out the data Learning problem heuristics Focus on online optimisation problems Technical challenge : how do we measure the quality of a heuristic’s decision when the future is uncertain? Solution : elegant combination of machine learning + optimisation post hoc 4

  5. Outline • Motivation: Lifelong learning • Lifelong learning of constraints • Lifelong learning of heuristics 5

  6. Motivation • Lifelong optimisation – Decision tools treat optimisation as an “one -off” problem – But we should view today’s problem in the context of yesterday’s and tomorrow’s 6

  7. Lifelong optimisation • New problems are often similar to old ones – We can learn to solve them better • Today’s problem may not be separable from tomorrow’s – We want to send our drivers back to the same drop-offs – Drivers want to do similar jobs every day – What we deliver today may change what we need to deliver tomorrow – … 7

  8. Lifelong learning of constraints • Learning constraints (specifically “nogoods”) across related problems (today’s schedule, tomorrow’s schedule, ..) 8

  9. Inter-instance learning • Nogood learning allows CP solver to learn nogoods from each conflict – Nogoods describe the reason for the failure, i.e., a sufficient set of conditions on the domain to cause failure • Propagating these nogoods prevent the solver from making the same search mistakes again – Can produce orders of magnitude speedup

  10. Inter-instance learning • Many applications where we solve multiple, similar instances of a problem – e.g. user may interact with solving process by adding, removing or altering tasks, resources, …

  11. Inter-instance learning • Weakness: Current nogood learning techniques only work within a single instance • Nogoods are not valid in another instance • Each instance must be solved from scratch – Ignores similarities of instances • We want to extend nogood learning: – carry what we learned from one instance to the next

  12. Nogood Learning • Current state of the art is called Lazy Clause Generation – Developed under our previous AOARD grant • Each propagator has to explain each of its inferences • e.g. Given constraint x ≤ y, and domain x ∈ [3..10] – infer y ≥ 3 and explain with: x ≥ 3 -> y ≥ 3

  13. Parameterized Nogoods • Standard nogood is only valid in the instance in which it was derived • We generalize LCG to learn parameterized nogoods – valid for entire problem class

  14. Problem class model • Create a problem class model – (V U Q, D, C, f), where Q is a set of parameter variables • Instance created by fixing Q to some set of values • E.g. In graph colouring, we can have V ≡ {v1, ..., vn}, Q ≡ {ai,j | i, j = 1..n}, C ≡ ai,j -> vi ≠ vj. The values of Q specify which edges exist.

  15. Example 3 • Consider the 3 graph colouring instances above where we are trying to 2-colour the graph • Suppose we tried v 1 = 1, v 2 = 2 in first instance. It fails and we learn a 1,3 /\ a 2,3 /\ v 1 = 1 /\ v 2 = 2 -> false • Reuse : parameter literals a 1,3 /\ a 2,3 – Not true in second instance -> can't use it – Holds in third instance -> can use it

  16. Experimental evaluation • Problems: Radiation Therapy Problem, Minimization of Open Stacks Problem, Graph Colouring Problem, Knapsack Problem • For each, create 100 "base" instances. – Then create modified versions of each base instance with small changes • How much speedup we get if we reuse nogoods from base instance compared to solving from scratch?

  17. Results • More similar -> more reuse -> more speedup • Effectiveness depends on problem class

  18. Conclusion & Future Work • Generalized Lazy Clause Generation to produce parameterized nogoods • They can be reused across multiple instances of the same problem class • Highly effective if new instances to be solved are very similar to previously solved instances

  19. Lifelong learning of heuristics • As well as problem constraints (“nogoods”), we can also learn problem solving heuristics (“rules of thumb”) – How I solve today’s problem should help me solve tomorrow’s – Heuristics are often even more re-usable than nogoods! 19

  20. Learning dispatch heuristics Dispatchers in parcel pickup & delivery firms learn to solve complex vehicle routing problems How do they do it? 20

  21. Learning dispatch heuristics • Elegant combination of machine learning and optimisation – Reinforcement learning used to learn good dispatch heuristics – Optimisation used to compute good answers • Was it a good idea or not to dispatch this truck to collect this parcel? • Fix this pickup, find best routing • Prohibit this pickup, find best routing 21

  22. Learning dispatch heuristics • Base dispatch heuristics – Nearest truck – Cheapest pickup – Least busy truck – … • Features – Time of day – Workload – Vehicle spread 22 – ..

  23. Learning dispatch heuristics • Neural network – Learns when to follow a particular base heuristic – Trained on real world data set from a local Sydney company • Performance – Looks highly promising – Much better than their rooky dispatchers 23

  24. Conclusions • Learning across instances of optimisation problems – Learning nogoods (constraints) – Learning heuristics • Our research continues to receive recognition – Best Paper award at 18 th Int. Conf. on Constraint Programming (Oct 2012) – Outstanding Programme Committee member award @ AI’12 (Dec 2012) 24

  25. List of Publications Attributed to the Grant Heuristics and Policies for Online Pickup and Delivery Problems Under review for 25th Conference on Innovative Applications of Artificial Intelligence (IAAI-13) The SEQBIN Constraint Revisited. Principles and Practice of Constraint Programming, CP-2012, 2012. A hybrid MIP/CP approach for multi-activity shift scheduling. CP-2012, 2012. Exploiting subproblem dominance in constraint programming. Constraints 17(1): 1-38 (2012) Conflict Directed Lazy Decomposition. CP 2012: 70-85 Inter-instance Nogood Learning in Constraint Programming. CP 2012: 238-247 Optimisation Modelling for Software Developers. CP 2012: 274-289 25

Recommend


More recommend