learning portfolios of automatically tuned planners
play

Learning Portfolios of Automatically Tuned Planners Jendrik Seipp 1 - PowerPoint PPT Presentation

Learning Portfolios of Automatically Tuned Planners Jendrik Seipp 1 Manuel Braun 1 Johannes Garimort 1 Malte Helmert 2 1 Albert-Ludwigs-Universit at Freiburg, Germany 2 Universit at Basel, Switzerland June 2012 IPC 2011 Sequential


  1. Learning Portfolios of Automatically Tuned Planners Jendrik Seipp 1 Manuel Braun 1 Johannes Garimort 1 Malte Helmert 2 1 Albert-Ludwigs-Universit¨ at Freiburg, Germany 2 Universit¨ at Basel, Switzerland June 2012

  2. IPC 2011 – Sequential Satisficing Track Results 240 220 Quality 200 180 160 2 1 2 1 1 p p e e 1 n n u u 0 u u o o 2 t t S S A o o e e M t t n n u u o o A A A t t L S S

  3. IPC 2011 – Sequential Satisficing Track Results 240 220 Quality 200 180 160 2 1 2 1 1 p p e e 1 n n u u 0 u u o o 2 t t S S A o o e e M t t n n u u o o A A A t t L S S

  4. Motivation Tuned planners: Tune for complete benchmark set Commit to single planner Portfolio planners: Manually select planners Calculate times greedily Our approach: Tune one planner for each domain in training set automatically Evaluate multiple portfolio generation methods

  5. Overview Domain Tuning Portfolio Learning

  6. Domain Tuning

  7. Tuning Procedure – Domains Training set of 21 former IPC domains (1998–2006) Tune Fast Downward with ParamILS for each domain

  8. Tuning Procedure – Configurations Heuristics: h FF , h add , h cg , h cea , h LM Searches: eager, lazy Type of landmarks, cost-handling, preferred operators Numerous combination options and conditional parameters → 2 . 99 · 10 13 configurations

  9. Tuning Results – Trends Preferred operators (19/21) Lazy search (20x), eager search (1x) Most configurations use one (10x) or two (9x) heuristics h FF (12x), h LM (11x), h cg (6x), h cea (4x), h add (1x)

  10. Tuning Results Planners coverage optical-t pathways pipes-t tpp . . . optical-t (48) 21 0 3 0 . . . Domains pathways (30) 22 29 . . . 30 30 pipes-t (50) 26 39 42 38 . . . tpp (30) 24 . . . 30 30 30 . . . . . . . . . . . . . . . . . .

  11. Portfolio Learning

  12. Portfolio Generators Input: planners, results on training set, total time limit Output: { depot: 18s, gripper: 65s, . . . }

  13. Stone Soup Hill-climbing in the portfolio space Start: { depot: 0, gripper: 0, . . . } Successors: { depot: g , gripper: 0, . . . } , { depot: 0, gripper: g , . . . } , . . . Choose best and repeat

  14. Uniform Run all planners for same amout of time Result: { depot: 85, gripper: 85, . . . }

  15. Selector Brute force For all subset sizes { 1 , . . . , 21 } compute best portfolio with equal time shares

  16. Cluster Find k clusters with k -means Cluster by quality From each cluster choose best planner Give all planners equal time shares

  17. Increasing Time Limit Iteratively increase the portfolio time limit Get problems that can be solved in that limit Find best planner for these problems Give it the needed time Repeat until no more problems solvable or time limit exceeded

  18. Domain-wise Iteratively retrieve domain with highest improvement potential Give the fastest improving planner the needed time Continue until total time limit reached or no more domains can be improved

  19. Randomized Iterative Search Use any existing portfolio as initialization (e.g. uniform) Successors: Swap time slice between planners Collect time from all planners and give it to single one Commit to first successor improving score Run until score stagnates long enough

  20. Portfolio Results 30 minutes 240 220 Quality 200 180 160 2 1 0 2 1 1 m 2 6 0 e S p p 1 s e e 1 1 1 1 I r i R n n u u 0 1 - - - o w r r L u u o o 2 - f o p e n T t t S S A i n t t u i o o e e c s I a M U o t t n n u e m u u S o o l l A A A e C e o t t S L S S n D o t S

  21. Different timeouts 1, 3, 5, 15 minutes Uniform portfolio outperforms LAMA even in 3 min setting Other portfolios are even better Less planners in portfolio when less time is available No portfolio dominates others for all timeouts Cluster and Increasing Time Limit among best performers Randomized Iterative Search prone to overfitting

  22. Outlook Promising initial results for optimal configurations Adaptively select next configuration Use more heterogeneous planners Apply automatic portfolio diversification in other areas

  23. Summary Tuning for domains is effective Tuned planners yield very good results in portfolio

Recommend


More recommend