planner metrics should satisfy independence of irrelevant
play

Planner Metrics Should Satisfy Independence of Irrelevant - PowerPoint PPT Presentation

Planner Metrics Should Satisfy Independence of Irrelevant Alternatives Jendrik Seipp July 12, 2019 University of Basel, Switzerland Independence of irrelevant alternatives (IIA) one of four criteria from Arrows impossibility theorem


  1. Planner Metrics Should Satisfy Independence of Irrelevant Alternatives Jendrik Seipp July 12, 2019 University of Basel, Switzerland

  2. Independence of irrelevant alternatives (IIA) • one of four criteria from Arrow’s impossibility theorem • decision whether A > B or A < B is irrelevant from C • important for planner metrics, but some violate it 1/9

  3. Independence of irrelevant alternatives (IIA) • one of four criteria from Arrow’s impossibility theorem • decision whether A > B or A < B is irrelevant from C • important for planner metrics, but some violate it 1/9

  4. • if reference plans are optimal, sat satisfies IIA sat does not satisfy IIA IPC satisficing track if solved 0 if unsolved • total score: sum of task scores • if reference plans can come from competitors, 2/9  Cost ∗ ( π )  Cost ( P ,π ) sat ( P , π ) =  • Cost ∗ ( π ) is the cost of a reference plan

  5. sat does not satisfy IIA IPC satisficing track if solved 0 if unsolved • total score: sum of task scores • if reference plans can come from competitors, 2/9  Cost ∗ ( π )  Cost ( P ,π ) sat ( P , π ) =  • Cost ∗ ( π ) is the cost of a reference plan • if reference plans are optimal, sat satisfies IIA

  6. IPC satisficing track if solved 0 if unsolved • total score: sum of task scores • if reference plans can come from competitors, 2/9  Cost ∗ ( π )  Cost ( P ,π ) sat ( P , π ) =  • Cost ∗ ( π ) is the cost of a reference plan • if reference plans are optimal, sat satisfies IIA sat does not satisfy IIA

  7. use optimal planners or domain-specific solvers to find IPC satisficing track – example sat good reference plans B > A 1.4 0.7 0.65 1/4 1/5 1/1 2 2/5 2/4 2/5 1 C B A 3/9 Cost R A B C 1.3 1.4 4/4 4/5 2/5 2/4 B A sat 6 4 5 1 2 5 4 5 π 1 π 1 π 2 π 2 ∑ → A > B

  8. use optimal planners or domain-specific solvers to find IPC satisficing track – example 1.3 good reference plans 1.4 0.7 0.65 1/4 1/5 1/1 2/5 2/4 2/5 C B A sat Cost R A B C 3/9 1.4 B 2 5 4 5 6 4 5 1 sat A 2/5 2/4 4/4 4/5 π 1 π 1 π 1 π 2 π 2 π 2 ∑ ∑ → A > B → B > A

  9. IPC satisficing track – example 1.3 good reference plans 1.4 0.7 0.65 1/4 1/5 1/1 2/5 2/4 2/5 C B A sat Cost R A B C 3/9 1.4 4/4 4/5 2/5 2/4 B A sat 2 5 4 5 6 4 5 1 π 1 π 1 π 1 π 2 π 2 π 2 ∑ ∑ → A > B → B > A → use optimal planners or domain-specific solvers to find

  10. use agl 2018 in future agile tracks agl 2018 P IPC agile track 1 300 if T P 0 300 T P if 1 300 T P 1 1 if T P otherwise 0 4/9 T ∗ ( π ) : mininum runtime of all participating planners  T ( P ,π ) 1 / ( 1 + log 10 T ∗ ( π ) ) if T ( P , π ) ≤ 300  agl 2014 ( P , π ) = 

  11. use agl 2018 in future agile tracks agl 2018 P IPC agile track 1 300 if T P 0 300 T P if 1 300 T P 1 1 if T P otherwise 0 4/9 T ∗ ( π ) : mininum runtime of all participating planners  T ( P ,π ) 1 / ( 1 + log 10 T ∗ ( π ) ) if T ( P , π ) ≤ 300  agl 2014 ( P , π ) = 

  12. IPC agile track 0 0 1 otherwise 4/9 T ∗ ( π ) : mininum runtime of all participating planners  T ( P ,π ) 1 / ( 1 + log 10 T ∗ ( π ) ) if T ( P , π ) ≤ 300  agl 2014 ( P , π ) =   if T ( P , π ) < 1    1 − log( T ( P ,π )) agl 2018 ( P , π ) = if 1 ≤ T ( P , π ) ≤ 300 log( 300 )   if T ( P , π ) > 300  → use agl 2018 in future agile tracks

  13. Sparkle planning challenge • new planning competition in 2019 • removing which planner decreases coverage the most? • uses runtime to break ties • focuses on coverage otherwise 0 5/9 • measure marginal contribution of each planner P to a portfolio selector over planners S state of the art” • “analyse the contribution of each planner to the real  par10 ( S \{ P } ) log 10 if par10 ( S \ { P } ) > par10 ( S )  par10 ( S ) sparkle ( P , π ) = 

  14. Sparkle planning challenge • new planning competition in 2019 • removing which planner decreases coverage the most? • uses runtime to break ties • focuses on coverage otherwise 0 5/9 • measure marginal contribution of each planner P to a portfolio selector over planners S state of the art” • “analyse the contribution of each planner to the real  par10 ( S \{ P } ) log 10 if par10 ( S \ { P } ) > par10 ( S )  par10 ( S ) sparkle ( P , π ) = 

  15. Sparkle planning challenge • new planning competition in 2019 • removing which planner decreases coverage the most? • uses runtime to break ties • focuses on coverage otherwise 0 5/9 • measure marginal contribution of each planner P to a portfolio selector over planners S state of the art” • “analyse the contribution of each planner to the real  par10 ( S \{ P } ) log 10 if par10 ( S \ { P } ) > par10 ( S )  par10 ( S ) sparkle ( P , π ) = 

  16. Sparkle planning challenge • new planning competition in 2019 • removing which planner decreases coverage the most? • uses runtime to break ties • focuses on coverage otherwise 0 5/9 • measure marginal contribution of each planner P to a portfolio selector over planners S state of the art” • “analyse the contribution of each planner to the real  par10 ( S \{ P } ) log 10 if par10 ( S \ { P } ) > par10 ( S )  par10 ( S ) sparkle ( P , π ) = 

  17. Sparkle planning challenge – example • 100 tasks • {A, B} B > A • {A, B, C} A > B 6/9 • planner A solves 1 task π • planners B and C solve 99 tasks but fail to solve π

  18. Sparkle planning challenge – example • 100 tasks 6/9 • planner A solves 1 task π • planners B and C solve 99 tasks but fail to solve π • {A, B} → B > A • {A, B, C} → A > B

  19. Sparkle planning challenge – problems of the metric • penalizes similar planners • easily gameable: submit several “dummy” planners and one “real” planner (leader board, IPC planners available) • penalizes collaboration, favors closed-source planners • discourages submitting multiple planners 7/9

  20. Sparkle planning challenge – suggestion • IIA: use fixed portfolio of baseline planners 8/9

  21. Summary • IIA is critical for evaluation metrics • several planner metrics do not satisfy IIA • there are alternatives that do satisfy IIA 9/9

Recommend


More recommend