Planner Metrics Should Satisfy Independence of Irrelevant Alternatives Jendrik Seipp July 12, 2019 University of Basel, Switzerland
Independence of irrelevant alternatives (IIA) • one of four criteria from Arrow’s impossibility theorem • decision whether A > B or A < B is irrelevant from C • important for planner metrics, but some violate it 1/9
Independence of irrelevant alternatives (IIA) • one of four criteria from Arrow’s impossibility theorem • decision whether A > B or A < B is irrelevant from C • important for planner metrics, but some violate it 1/9
• if reference plans are optimal, sat satisfies IIA sat does not satisfy IIA IPC satisficing track if solved 0 if unsolved • total score: sum of task scores • if reference plans can come from competitors, 2/9 Cost ∗ ( π ) Cost ( P ,π ) sat ( P , π ) = • Cost ∗ ( π ) is the cost of a reference plan
sat does not satisfy IIA IPC satisficing track if solved 0 if unsolved • total score: sum of task scores • if reference plans can come from competitors, 2/9 Cost ∗ ( π ) Cost ( P ,π ) sat ( P , π ) = • Cost ∗ ( π ) is the cost of a reference plan • if reference plans are optimal, sat satisfies IIA
IPC satisficing track if solved 0 if unsolved • total score: sum of task scores • if reference plans can come from competitors, 2/9 Cost ∗ ( π ) Cost ( P ,π ) sat ( P , π ) = • Cost ∗ ( π ) is the cost of a reference plan • if reference plans are optimal, sat satisfies IIA sat does not satisfy IIA
use optimal planners or domain-specific solvers to find IPC satisficing track – example sat good reference plans B > A 1.4 0.7 0.65 1/4 1/5 1/1 2 2/5 2/4 2/5 1 C B A 3/9 Cost R A B C 1.3 1.4 4/4 4/5 2/5 2/4 B A sat 6 4 5 1 2 5 4 5 π 1 π 1 π 2 π 2 ∑ → A > B
use optimal planners or domain-specific solvers to find IPC satisficing track – example 1.3 good reference plans 1.4 0.7 0.65 1/4 1/5 1/1 2/5 2/4 2/5 C B A sat Cost R A B C 3/9 1.4 B 2 5 4 5 6 4 5 1 sat A 2/5 2/4 4/4 4/5 π 1 π 1 π 1 π 2 π 2 π 2 ∑ ∑ → A > B → B > A
IPC satisficing track – example 1.3 good reference plans 1.4 0.7 0.65 1/4 1/5 1/1 2/5 2/4 2/5 C B A sat Cost R A B C 3/9 1.4 4/4 4/5 2/5 2/4 B A sat 2 5 4 5 6 4 5 1 π 1 π 1 π 1 π 2 π 2 π 2 ∑ ∑ → A > B → B > A → use optimal planners or domain-specific solvers to find
use agl 2018 in future agile tracks agl 2018 P IPC agile track 1 300 if T P 0 300 T P if 1 300 T P 1 1 if T P otherwise 0 4/9 T ∗ ( π ) : mininum runtime of all participating planners T ( P ,π ) 1 / ( 1 + log 10 T ∗ ( π ) ) if T ( P , π ) ≤ 300 agl 2014 ( P , π ) =
use agl 2018 in future agile tracks agl 2018 P IPC agile track 1 300 if T P 0 300 T P if 1 300 T P 1 1 if T P otherwise 0 4/9 T ∗ ( π ) : mininum runtime of all participating planners T ( P ,π ) 1 / ( 1 + log 10 T ∗ ( π ) ) if T ( P , π ) ≤ 300 agl 2014 ( P , π ) =
IPC agile track 0 0 1 otherwise 4/9 T ∗ ( π ) : mininum runtime of all participating planners T ( P ,π ) 1 / ( 1 + log 10 T ∗ ( π ) ) if T ( P , π ) ≤ 300 agl 2014 ( P , π ) = if T ( P , π ) < 1 1 − log( T ( P ,π )) agl 2018 ( P , π ) = if 1 ≤ T ( P , π ) ≤ 300 log( 300 ) if T ( P , π ) > 300 → use agl 2018 in future agile tracks
Sparkle planning challenge • new planning competition in 2019 • removing which planner decreases coverage the most? • uses runtime to break ties • focuses on coverage otherwise 0 5/9 • measure marginal contribution of each planner P to a portfolio selector over planners S state of the art” • “analyse the contribution of each planner to the real par10 ( S \{ P } ) log 10 if par10 ( S \ { P } ) > par10 ( S ) par10 ( S ) sparkle ( P , π ) =
Sparkle planning challenge • new planning competition in 2019 • removing which planner decreases coverage the most? • uses runtime to break ties • focuses on coverage otherwise 0 5/9 • measure marginal contribution of each planner P to a portfolio selector over planners S state of the art” • “analyse the contribution of each planner to the real par10 ( S \{ P } ) log 10 if par10 ( S \ { P } ) > par10 ( S ) par10 ( S ) sparkle ( P , π ) =
Sparkle planning challenge • new planning competition in 2019 • removing which planner decreases coverage the most? • uses runtime to break ties • focuses on coverage otherwise 0 5/9 • measure marginal contribution of each planner P to a portfolio selector over planners S state of the art” • “analyse the contribution of each planner to the real par10 ( S \{ P } ) log 10 if par10 ( S \ { P } ) > par10 ( S ) par10 ( S ) sparkle ( P , π ) =
Sparkle planning challenge • new planning competition in 2019 • removing which planner decreases coverage the most? • uses runtime to break ties • focuses on coverage otherwise 0 5/9 • measure marginal contribution of each planner P to a portfolio selector over planners S state of the art” • “analyse the contribution of each planner to the real par10 ( S \{ P } ) log 10 if par10 ( S \ { P } ) > par10 ( S ) par10 ( S ) sparkle ( P , π ) =
Sparkle planning challenge – example • 100 tasks • {A, B} B > A • {A, B, C} A > B 6/9 • planner A solves 1 task π • planners B and C solve 99 tasks but fail to solve π
Sparkle planning challenge – example • 100 tasks 6/9 • planner A solves 1 task π • planners B and C solve 99 tasks but fail to solve π • {A, B} → B > A • {A, B, C} → A > B
Sparkle planning challenge – problems of the metric • penalizes similar planners • easily gameable: submit several “dummy” planners and one “real” planner (leader board, IPC planners available) • penalizes collaboration, favors closed-source planners • discourages submitting multiple planners 7/9
Sparkle planning challenge – suggestion • IIA: use fixed portfolio of baseline planners 8/9
Summary • IIA is critical for evaluation metrics • several planner metrics do not satisfy IIA • there are alternatives that do satisfy IIA 9/9
Recommend
More recommend