Backward state-space search ο½ An action π is relevant for π , if π can be the last step in a plan leading to π : A B πππ€π(π΅, π¦, πΆ) π β© ADD π β ο C ? π β© DEL(π) = ο ο½ Regression: To achieve goal π , we regress it through a relevant action π ( π as final step of plan to reach π ): πβ² = (π β ADD(π)) βͺ PRE(π) 23
Regression Example move(B,A,C) B ??? A C ο½ ππππ = {ππ(πΆ, π·), ππ(πππππ, π΅)} ADD(π): ππ(πΆ, π·) DEL(π): ππ(πΆ, π΅) ο½ Relevant action: π = πππ€π(πΆ, π΅, π·) PREC(π): ππ(πΆ, π΅), π·ππππ (πΆ), π·ππππ (π·) g β© ADD(a) = {On(B,C)} β ο g β© DEL(a) = ο ο½ Regression (add preconds. of π , remove predicates in add list π ) ο½ ππππ = {ππ(πΆ, π·), ππ(πππππ, π΅), ππ(πΆ, π΅), πππππ (πΆ), πππππ (π·)} 24
Backward Search Backward-Search( s, g ) if s satisfies g then return [] relevant = { a | a is relevant to g } if relevant = ο then return failure for each a ο relevant do g β = (π β ADD(π)) βͺ PREC(π) ο° β = Backward-Search( s, g β ) if ο° ββ failure then return [ ο° β |a] return failure 25
Backward Search ο½ Instantiating Schema ο½ Goal as a conjunction of literals that may contain variables ο½ T o be more efficient, instantiate schema variables by unification , rather than generating and testing different actions ο½ For most domains, it has lower branching factor than forward search ο½ Heuristics are more difficult to use ο½ It is based on set of states rather than individual states. 26
Regression Example move(B,?,C) B ??? A C ο½ ππππ = {ππ(πΆ, π·), ππ(πππππ, π΅)} ADD(π): ππ(πΆ, π·) DEL(π): ππ(πΆ, ? ) ο½ Relevant action: π = πππ€π(πΆ, ? , π·) PREC(π): ππ(πΆ, ? ), π·ππππ (πΆ), π·ππππ (π·) g β© ADD(a) = {On(B,C)} β ο g β© DEL(a) = ο ο½ Regression (add preconds. of π , remove predicates in add list π ) ο½ ππππ = {ππ(πΆ, π·), ππ(πππππ, π΅), ππ(πΆ, ? ), πππππ (πΆ), πππππ (π·)} 27
State-space search problems ο½ Both of forward and backward algorithms may have repeated states problem ο½ visited states must be recorded ο½ What β s wrong with search? ο½ Branching factor is usually too high. ο½ Combinatorial explosion if state given by set of possible worlds/logical interpretations/variable assignments 28
Heuristic for planning ο½ Solving problems by searching atomic states (Chapter 3) ο½ Human intelligence is usually used to define domain-specific heuristics ο½ Assumption: β path cost = number of plan steps β ο½ We want to estimate # of steps needed to reach π from π‘ ο½ In planning, problems we use factored representation of states ο½ Allows us to find domain-independent heuristics 29
Heuristic for planning ο½ Heuristics: ο½ Relaxed problems: ο½ Ignore delete lists ο½ Ignore preconditions ο½ Problem decomposition ο½ Sub-goal independence assumption 30
Heuristics: relaxed problems Ignore delete lists: 1) Delete negative effects from actions, solve relaxed problem and use the length of plan as heuristic ο½ Admissible ? ο½ Can we solve this problem in polynomial time? Ignore preconditions: 2) Delete all preconditions from actions, solve relaxed problem and use the length of plan as heuristic ο½ Admissible ? ο½ Can we solve this problem in polynomial time? 31
Heuristics: problem decomposition π(π, π‘) : minimum # of steps needed to reach proposition π from π‘ ο½ Sum of the cost of reaching each sub-goal from π‘ β π‘π£π (π‘) = π(π, π‘) π βπ» ο½ Not necessarily admissible ο½ independence assumption can be pessimistic ο½ Max of the cost of reaching each sub-goal from π‘ β πππ¦ (π‘) = mππ¦ π βπ» π(π, π‘) 32
Heuristics: problem decomposition (sum or max) ο½ Max or sum? ο½ Admissibility vs. accuracy ο½ Sum works well in practice for problems that are largely decomposable. ο½ How to compute π(π, π‘) ? 33
Ignore delete lists & problem decomposition ο½ When both ignoring delete lists & decomposing the problem ο½ we can compute π(π, π‘) in polynomial time using the Planning Graph (we will see it in the next slides). ο½ Examples of such heuristics used in these planners: ο½ HSP ο½ Fast-Forward (FF) ο¨ Competed in fully automated track of AIPS β 2000 ο¨ Granted ``Group A distinguished performance Planning System' β ο¨ Estimate the heuristic with the help of a planning graph J. Hoffman, B. Nebel, β The FF planning system: Fast plan generation through heuristic search β , Journal of Artificial Intelligence Research 14 (2001), 253-302 34
Planning graph ο½ A way to find accurate heuristics ο½ (Under)estimating no. steps required to reach π ο½ Admissible ο½ A layered graph that keeps track of literal pairs and action pairs that cannot be reached simultaneously (mutexes) 35
Planning graph: structure ο½ Directed, leveled graph ο½ Two types of levels: ο½ π : proposition levels ο½ π΅ : action levels ο½ Proposition and action levels alternate ο½ Edges (between levels) ο½ Precondition: each action at π΅ π is connected to its preconditions at π π ο½ Effect: each action at π΅ π is connected to its effects at π π+1 36
Planning graph: layers ο½ π π contains all the literals that could hold at time π ο½ π΅ π contains all actions whose preconditions are satisfied in π π plus no-op actions (to solve frame problem). β¦ β¦ β¦ (Initial state) π 0 π π 2 π΅ 0 π΅ 1 1 37
Planning graph: layers π 0 = {π β π½πππ’} π΅π = {π is an action| PRECONDS(π) β π π } π π+1 = {π β EFFECT(π)| π β π΅ π } β¦ β¦ β¦ (Initial state) π 0 π π΅ 0 π΅ 1 1 38
Planning graph: Cake example π½πππ’(πΌππ€π(π·πππ)) ο½ π»πππ πΌππ€π(π·πππ) β§ πΉππ’ππ(π·πππ) ο½ π΅ππ’πππ(πΉππ’(π·πππ) ο½ PRECOND: πΌππ€π(π·πππ) ο½ EFFECT: Β¬πΌππ€π(π·πππ) β§ πΉππ’ππ π·πππ ) ο½ π΅ππ’πππ(πΆπππ(π·πππ) ο½ PRECOND: Β¬πΌππ€π(π·πππ) ο½ EFFECT: πΌππ€π(π·πππ)) ο½ no-op action 39
Planning graph: Spare tire example π½πππ’ πππ π πΊπππ’ β§ πππ π ππππ π β§ π΅π’ πΊπππ’, π΅π¦ππ β§ π΅π’ ππππ π, ππ π£ππ π»πππ(π΅π’(ππππ π, π΅π¦ππ)) π΅ππ’πππ(πππππ€π(πππ, πππ), πππΉπ·πππΈ: π΅π’(πππ, πππ), πΉπΊπΊπΉπ·π: Β¬π΅π’(πππ, πππ) β§ π΅π’(πππ, π»π ππ£ππ)) π΅ππ’πππ(ππ£π’ππ(π’, π΅π¦ππ), πππΉπ·πππΈ: πππ π(π’) β§ π΅π’(π’, π»π ππ£ππ) β§ ο π΅π’(πΊπππ’, π΅π¦ππ) πΉπΊπΊπΉπ·π: ο π΅π’(π’, π»π ππ£ππ) β§ π΅π’(πΊπππ’, π΅π¦ππ)) π΅ππ’πππ(ππππ€πππ€ππ πππβπ’, πππΉπ·πππΈ: πΉπΊπΊπΉπ·π: ο π΅π’(ππππ π, π»π ππ£ππ) β§ ο π΅π’(ππππ π, ππ π£ππ) β§ ο π΅π’(ππππ π, π΅π¦ππ) β§ ο π΅π’(πΊπππ’, π»π ππ£ππ) β§ ο π΅π’(πΊπππ’, ππ π£ππ) β§ ο π΅π’(πΊπππ’, π΅π¦ππ)) 40
Planning graph: Spare tire example 41
Planning graphs: properties ο½ In level π π , both π and Β¬π may exist. ο½ A literal may appear at level π π while actually it could not be true until a later level (if any) ο½ A literal will never appear late in the planning graph. 42
Planning graphs: cost of each goal literal ο½ How difficult is it to achieve a goal literal π π from π‘ ? ο½ Level-cost of π π ( ππ(π π , π‘) ) : It shows the first level of PG at which π π appears. ο½ Relation to previously introduced heuristics? ο¨ Is it accurate? 43
Planning graphs: heuristics β max _πππ€ππ (π‘) = π π β ππππ ππ(π π , π‘) mππ¦ β πππ€ππ_π‘π£π (π‘) = ππ(π π , π‘) π π β ππππ 44
Planning graphs: constraints ο½ Mutual exclusion (mutex) links ο½ Two actions at a given action level are mutually exclusive if no valid plan could possibly contain both. ο½ Two propositions at a given proposition level are mutually exclusive if no valid plan could possibly make both true. ο½ This structure helps in reducing the search for a sub-graph of a Planning Graph that might correspond to a valid plan. 45
Planning graphs: constraints ο½ Mutexes between actions ο½ Inconsistent effects: one action negates an effect of the other ο½ Interference: one of the effects of one action is the negation of a precondition of the other ο½ Competing needs: mutually exclusive preconditions ο½ Mutexes between literals ο½ One of the literals is the negation of the other ο½ Inconsistent support: Each possible pair of actions that could achieve them (in this level) is mutually exclusive. 46
Planning graphs: constraints Types of mutexes Interference Inconsistent (Prec-Effect) Effects Inconsistent Competing Support Needs 47
Planning graph: Spare tire example 48
Planning graph: more accurate heuristic ο½ We want to define a more accurate heuristic using the mutexes: β 2 (set-level heuristic): the level at which all the goal literals appear without any pair of them being mutually exclusive. ο½ β 1 (max-level heuristic) is extended to β 2 considering mutexes between all pairs of propositions. ο½ β 2 is more useful than β 1 ( 0 β€ β 1 β€ β 2 β€ β β ) 49
Planning graph: more accurate heuristics ο½ β 2 can be extended to β 3 by defining and considering inconsistencies of triplets of propositions ο½ In general ο½ β π are admissible ο½ β π+1 β₯ β π ο½ Computing β π is π(π π ) with π propositions ο½ π = 2 is commonly used 50
GraphPlan: basic idea ο½ Construct a graph that encodes constraints on plans ο½ Use this graph to constrain search for a valid plan: ο½ If a valid plan exists it is a sub-graph of the Planning Graph. ο½ Actions at the same level don β t interfere ο½ Each action β s preconditions are made true by the plan ο½ Goals are satisfied ο½ Planning graph can be built for each problem in polynomial time. 51
GraphPlan: level off ο½ Definition: Planning Graph levels off if two consecutive proposition levels are identical (both literals and mutexes). ο½ We will show that the set of literals never decreases in the proposition levels and mutexes don β t reappear. 52
GraphPlan: level off (Observation 1) p p p p A A A Β¬q q q q Β¬r Β¬q Β¬q Β¬q B B Β¬r r r Β¬r Β¬r Literals monotonically increase Propositions are always carried forward by no-ops. 53
GraphPlan: level off (Observation 2) p p p p A A A Β¬q q q q Β¬r Β¬q Β¬q Β¬q B B Β¬r r r Β¬r Β¬r Actions monotonically increase (Once an action appears at a level, it will appear at all subsequent levels) If preconds. of an action appear at one level, they will appear at subsequent levels and thus the action will appear so. 54
GraphPlan: level off (Observation 3) p p p q q q A r r r β¦ β¦ β¦ Proposition mutex relationships monotonically decrease Available actions are monotonically increasing. Thus mutex relations between literals are decreasing. (When mutexes between literals are due to mutex relations between actions, they may be removed in the next levels) 55
GraphPlan: level off (Observation 4) A A A p p p p q q q q B B B β¦ r r r C C C s s s β¦ β¦ β¦ Action mutex relationships monotonically decrease Mutex relations between actions due to competing needs (when preconditions are not negations of each other) must be decreasing. 56
GraphPlan Algorithm necessary, but usually insufficient condition for plan existence Graph levels are constructed until all goals are reached 1) and not mutex. ο½ If PG levels off before reaching this level, GraphPlan returns failure. ExtractSolution phase: search the PG for a valid plan 2) If non found, add a level to the PG and go to step 2. 3) GraphPlan builds graph forward and extracts plan backwards 57
GraphPlan: β Extract Solution β phase ο½ Some ways ο½ As a backward search ο½ looks for actions that produce goals while pruning as many of them as possible via incompatibility information. ο½ As a heuristic search computes an admissible heuristic for each state and then uses it during search. ο½ As a CSP (related to SATPlan algorithm) ο½ Variables: a variable for an action at each level ο½ Domain={0,1} ο½ Constraints: mutexes 58
Extract Solution: backward search Start from the last level & agenda=goals ο½ Termination: π = 0 ο½ Action Selection: At each level π, select any conflict-free subset of actions in π΅ πβ1 whose effects cover current goals. ο½ If no such subset is found return failure ο½ Preconditions of selected actions become new goals for recursive call at level π β 1 . 59
GraphPlan: Example π΅π’(π, πΆ ) π΅π’(π, πΆ ) πΊππ§(π, π΅, πΆ) πΊππ§(π, π΅, πΆ) π΅π’ π, π΅ π΅π’(π, π΅) π΅π’(π, π΅) πΊππ§(π, πΆ, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) ππππ(π·, π, π΅) ππππ(π·, π, π΅) π½π(π·, π) π½π(π·, π) Β¬π΅π’(π, πΆ) ππππππ(π·, π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π·, π΅) Β¬π΅π’(π·, π΅) ππππππ(π·, π, πΆ) Β¬π½π(π·, π) π΅π’(π·, πΆ) A: Airport Goal P: Plane π΅π’(π·, πΆ) C: Cargo 60
GraphPlan: Example π΅π’(π, πΆ ) π΅π’(π, πΆ ) πΊππ§(π, π΅, πΆ) πΊππ§(π, π΅, πΆ) π΅π’ π, π΅ π΅π’(π, π΅) π΅π’(π, π΅) πΊππ§(π, πΆ, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) ππππ(π·, π, π΅) ππππ(π·, π, π΅) π½π(π·, π) π½π(π·, π) Β¬π΅π’(π, πΆ) ππππππ(π·, π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π·, π΅) Β¬π΅π’(π·, π΅) ππππππ(π·, π, πΆ) Β¬π½π(π·, π) π΅π’(π·, πΆ) A: Airport Goal P: Plane π΅π’(π·, πΆ) C: Cargo 61
GraphPlan: Example π΅π’(π, πΆ ) π΅π’(π, πΆ ) π΅π’(π, πΆ ) πΊππ§(π, π΅, πΆ) πΊππ§(π, π΅, πΆ) πΊππ§(π, π΅, πΆ) π΅π’ π, π΅ π΅π’(π, π΅) π΅π’(π, π΅) π΅π’(π, π΅) πΊππ§(π, πΆ, π΅) πΊππ§(π, πΆ, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) ππππ(π·, π, π΅) ππππ(π·, π, π΅) ππππ(π·, π, π΅) π½π(π·, π) π½π(π·, π) π½π(π·, π) Β¬π΅π’(π, πΆ) Β¬π΅π’(π, πΆ) ππππππ(π·, π, π΅) ππππππ(π·, π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π, π΅) ππππ(π·, π, πΆ) Β¬π΅π’(π·, π΅) Β¬π΅π’(π·, π΅) Β¬π΅π’(π·, π΅) ππππππ(π·, π, πΆ) Β¬π΅π’(π·, πΆ) ππππππ(π·, π, πΆ) Β¬π½π(π·, π) Β¬π½π(π·, π) π΅π’(π·, πΆ) π΅π’(π·, πΆ) A: Airport Goal P: Plane π΅π’(π·, πΆ) C: Cargo 62
GraphPlan: Example π΅π’(π, πΆ ) π΅π’(π, πΆ ) π΅π’(π, πΆ ) πΊππ§(π, π΅, πΆ) πΊππ§(π, π΅, πΆ) πΊππ§(π, π΅, πΆ) π΅π’ π, π΅ π΅π’(π, π΅) π΅π’(π, π΅) π΅π’(π, π΅) πΊππ§(π, πΆ, π΅) πΊππ§(π, πΆ, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) ππππ(π·, π, π΅) ππππ(π·, π, π΅) ππππ(π·, π, π΅) π½π(π·, π) π½π(π·, π) π½π(π·, π) Β¬π΅π’(π, πΆ) Β¬π΅π’(π, πΆ) ππππππ(π·, π, π΅) ππππππ(π·, π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π, π΅) ππππ(π·, π, πΆ) Β¬π΅π’(π·, π΅) Β¬π΅π’(π·, π΅) Β¬π΅π’(π·, π΅) ππππππ(π·, π, πΆ) Β¬π΅π’(π·, πΆ) ππππππ(π·, π, πΆ) Β¬π½π(π·, π) Β¬π½π(π·, π) π΅π’(π·, πΆ) π΅π’(π·, πΆ) A: Airport Goal P: Plane π΅π’(π·, πΆ) C: Cargo 63
GraphPlan: Example π΅π’(π, πΆ ) π΅π’(π, πΆ ) π΅π’(π, πΆ ) πΊππ§(π, π΅, πΆ) πΊππ§(π, π΅, πΆ) πΊππ§(π, π΅, πΆ) π΅π’ π, π΅ π΅π’(π, π΅) π΅π’(π, π΅) π΅π’(π, π΅) πΊππ§(π, πΆ, π΅) πΊππ§(π, πΆ, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) ππππ(π·, π, π΅) ππππ(π·, π, π΅) ππππ(π·, π, π΅) π½π(π·, π) π½π(π·, π) π½π(π·, π) Β¬π΅π’(π, πΆ) Β¬π΅π’(π, πΆ) ππππππ(π·, π, π΅) ππππππ(π·, π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π, π΅) ππππ(π·, π, πΆ) Β¬π΅π’(π·, π΅) Β¬π΅π’(π·, π΅) Β¬π΅π’(π·, π΅) ππππππ(π·, π, πΆ) Β¬π΅π’(π·, πΆ) ππππππ(π·, π, πΆ) Β¬π½π(π·, π) Β¬π½π(π·, π) π΅π’(π·, πΆ) π΅π’(π·, πΆ) A: Airport Goal P: Plane π΅π’(π·, πΆ) C: Cargo 64
GraphPlan: Example π΅π’(π, πΆ ) π΅π’(π, πΆ ) π΅π’(π, πΆ ) πΊππ§(π, π΅, πΆ) πΊππ§(π, π΅, πΆ) πΊππ§(π, π΅, πΆ) π΅π’ π, π΅ π΅π’(π, π΅) π΅π’(π, π΅) π΅π’(π, π΅) πΊππ§(π, πΆ, π΅) πΊππ§(π, πΆ, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) π΅π’(π·, π΅) ππππ(π·, π, π΅) ππππ(π·, π, π΅) ππππ(π·, π, π΅) π½π(π·, π) π½π(π·, π) π½π(π·, π) Β¬π΅π’(π, πΆ) Β¬π΅π’(π, πΆ) ππππππ(π·, π, π΅) ππππππ(π·, π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π, π΅) Β¬π΅π’(π, π΅) ππππ(π·, π, πΆ) Β¬π΅π’(π·, π΅) Β¬π΅π’(π·, π΅) Β¬π΅π’(π·, π΅) ππππππ(π·, π, πΆ) Β¬π΅π’(π·, πΆ) ππππππ(π·, π, πΆ) Β¬π½π(π·, π) Β¬π½π(π·, π) π΅π’(π·, πΆ) π΅π’(π·, πΆ) A: Airport Goal P: Plane π΅π’(π·, πΆ) C: Cargo 65
GraphPlan: heuristics for backward search ο½ Pick first the goal literal with the highest level cost ο½ To achieve a literal prefer actions with easier preconds. ο½ Sum (or max) of the level costs of its preconds. is smallest. 66
Planning as a satisfiability problem ο½ Bounded planning problem (π, π) : ο½ π is a planning problem ο½ Find a solution for π of length π Translate (π, π) into a SAT problem. 1) Solve SAT problem. 2) Convert the solution to a plan 3) Solution plan Planning problem Satisfiability problem Logical model Satisfiability Decode Translate solution Solver to PL 67
Pictorial view of fluent for (P,k) Initial state state fluents action fluents fluents (t=0) at t=k at t=k-1 s 0 a 0 a k-1 s k β¦ β¦ β¦ β¦ β¦ β¦ ο½ Truth assignment selects a subset of these nodes to be true ο½ Propositional formulas correspond to valid plans 68
Translating PDDL to propositional logic ο½ Initial state: Conjunction of all true literals at time 0 (and negation of not mentioned literals) ο½ Goal state: Conjunction of all goal literals at time π ο½ Instantiate literals containing variable (replace with β¨ over constants). ο½ Actions ο½ successor-state axioms at each time up to π’ ο½ πΊ π’+1 β π΅ππ’ππππ·ππ£π‘ππ‘πΊ π’ β¨ (πΊ π’ β§ Β¬π΅ππ’ππππ·ππ£π‘ππ‘πππ’πΊ π’ ) ο½ precondition axioms: ο½ π΅ π’ β PRECOND π΅ π’ ο½ action exclusion axioms: π’ β¨ Β¬π΅ π π’ ο½ Β¬π΅ π 69
Translating PDDL to propositional logic: Example ο½ Initial state: Conjunction of all true literals at time 0 (and negation of not mentioned literals) A ο½ π½πππ’(ππ(π΅, πΆ) β§ ππ(πΆ, πππππ)) ο½ ππ π΅, πΆ 0 β§ ππ πΆ, πππππ 0 β§ Β¬ππ πΆ, π΅ 0 β§ Β¬ππ π΅, πππππ 0 B ο½ Goal state: Conjunction of all goal literals at time π B ο½ Instantiate literals containing variable (replace with β¨ over constants). ο½ π»πππ(ππ(πΆ, π΅)) A ο½ ππ πΆ, π΅ 1 (for π = 1 ) 70
Translating PDDL to propositional logic: Example ο½ Add successor-state axioms at each time up to π’ ο½ πΊ π’+1 β π΅ππ’ππππ·ππ£π‘ππ‘πΊ π’ β¨ (πΊ π’ β§ Β¬π΅ππ’ππππ·ππ£π‘ππ‘πππ’πΊ π’ ) ο½ Example: ππ πΆ, π΅ π’+1 β πππ€π πΆ, πππππ, π΅ π’ β¨ [On B, A t β§ Β¬πππ€π πΆ, π΅, πππππ π’ ] ο½ Add precondition axioms: ο½ π΅ π’ β PRECOND π΅ π’ ο½ Example: πππ€π πΆ, πππππ, π΅ π’ β ππ πΆ, πππππ π’ β§ π·ππππ πΆ π’ β§ π·ππππ π΅ π’ ο½ Is it necessary to include effects: π΅ π’ β EFFECT π΅ π’+1 ? ο½ Add action exclusion axioms: π’ β¨ Β¬π΅ π π’ ο½ Β¬π΅ π ο½ Example: Β¬πππ€π πΆ, πππππ, π΅ 0 β¨ Β¬πππ€ππππππππ π΅, πΆ 0 71
Propositional logic solver and decoding ο½ Apply a SAT solver to the whole sentence ο ο½ ο : conjunction of encoding initial state, goals, successor-state axioms, precondition axioms, action exclusion axioms ο½ If an assignment of truth values that satisfies ο is found, extract action sequence. ο½ This means π has a solution of length π ο½ Extract solution: For π = 0, β¦ , π β 1 , there is exactly one action that has been assigned β True β ο½ This is the π β th action of the plan. 72
SATPlan function SATPLAN(ππππ’, π’π πππ‘ππ’πππ, ππππ, π_πππ¦) returns solution or failure inputs : ππππ’, π’π πππ‘ππ’πππ, ππππ , constitute a description of the problem π_πππ¦ , an upper limit for plan length for π’ = 0 to π_max do πππ β TRANSLATE_TO_SAT(ππππ’, π’π πππ‘ππ’πππ, ππππ, π’ ) πππππ β SAT_SOLVER(πππ) if πππππ β {} then return EXTRACT_SOLUTION(πππππ) return πππππ£π π It is guaranteed to find the shortest plan if one exist. 73
SATPlan example ο½ Domain: ο½ Robot π ο½ T wo locations π 1 , π 2 ο½ One operator β move β the robot ο½ Initial state: π΅π’(π, π 1 ) π 1 π 2 ο½ Goal: π΅π’(π, π 2 ) ο½ Action schema: ο½ πππ€π π , π, πβ ο½ πππΉπ·πππΈ: π΅π’(π , π) ο½ πΉπΊπΊπΉπ·π: π΅π’(π , πβ) β§ Β¬π΅π’(π , π) 74
SATPlan example (translation to SAT) ο½ Encode (π, 1) ο½ Initial state: ο½ π΅π’(π, π 1 , 0) β§ Β¬ π΅π’(π, π 2 , 0) ο½ Goal: ο½ π΅π’(π, π 2 , 1) ο½ Actions preconditions: ο½ πππ€π(π, π 1 , π 2 , 0) β π΅π’(π, π 1 , 0) ο½ πππ€π(π, π 2 , π 1 , 0) β π΅π’(π, π 2 , 0) ο½ Action exclusion axiom: ο½ Β¬πππ€π(π, π 2 , π 1 , 0) β¨ Β¬πππ€π(π, π 1 , π 2 , 0) 75
SATPlan example (translation to SAT) ο½ Fluents (Success-state axioms): ο π΅π’ π, π 1 , 0 ο π΅π’ π, π 1 , 1 ο πππ€π π, π 2 , π 1 , 0 ο½ ο π΅π’ π, π 2 , 0 ο π΅π’ π, π 2 , 1 ο πππ€π π, π 1 , π 2 , 0 ο½ π΅π’ π, π 1 , 0 οο π΅π’ π, π 1 , 1 ο πππ€π π, π 1 , π 2 , 0 ο½ ο½ π΅π’ π, π 2 , 0 οο π΅π’ π, π 2 , 1 ο πππ€π π, π 2 , π 1 , 0 76
SATPlan example (translation to SAT) π΅π’(π, π 1 , 0) β§ Β¬π΅π’(π, π 2 , 0) β§ π΅π’(π, π 2 , 1) β§ [πππ€π π, π 1 , π 2 , 0 ο π΅π’ π, π 1 , 0 ] β§ [πππ€π π, π 1 , π 2 , 0 ο π΅π’ π, π 2 , 1 ] β§ SAT formula [Β¬πππ€π(π, π 2 , π 1 , 0) β¨ Β¬πππ€π(π, π 1 , π 2 , 0) ] β§ for ( P ,1) [ ο π΅π’ π, π 1 , 0 ο π΅π’ π, π 1 , 1 ο πππ€π π, π 2 , π 1 , 0 ] β§ [ ο π΅π’ π, π 2 , 0 ο π΅π’ π, π 2 , 1 ο πππ€π π, π 1 , π 2 , 0 ] β§ [π΅π’ π, π 1 , 0 οο π΅π’ π, π 1 , 1 ο πππ€π π, π 1 , π 2 , 0 ] β§ π΅π’ π, π 2 , 0 οο π΅π’ π, π 2 , 1 ο πππ€π π, π 2 , π 1 , 0 Above formula is converted to CNF and solved by a SAT solver. 77
SATPlan example (Extracting a plan) ο½ ο can be satisfied with πππ€π(π, π 1 , π 2 , 0) = π’π π£π ο½ β πππ€π(π, π 1 , π 2 , 0) is a solution (and the only one) for panning problem with 1 step plan 78
Layered Plans in SATPlan ο½ Complete exclusion axiom (only one action at a time): ο½ For all pairs of actions at each time step i: ο π π ο ο ππ ο½ Partial exclusion axiom (more than one action could be taken at a time step): ο½ For any pair of incompatible actions (recall from Graphplan): ο π π ο ο ππ ο½ Fewer time steps may be required (i.e. shorter formulas) 79
Solving SAT problem ο½ Systematic search ο½ DPLL (Davis Putnam Logemann Loveland) ο½ Local search ο½ WalkSAT 80
Partial order planning Sock-shoe example: PDDL ο½ π½πππ’() ο½ π»πππ(πππβπ’πβππππ ο ππππ’πβππππ) ο½ π΅ππ’πππ(πππβπ’πβππ, ο½ PRECOND: πππβπ’ππππππ, ο½ EFFECT: πππβπ’πβππππ)) ο½ π΅ππ’πππ(πππβπ’ππππ, ο½ EFFECT: πππβπ’ππππππ)) ο½ π΅ππ’πππ(ππππ’πβππ, ο½ PRECOND: ππππ’ππππππ, ο½ EFFECT: ππππ’πβππππ) ο½ π΅ππ’πππ(ππππ’ππππ, ο½ EFFECT: ππππ’ππππππ) 81
Total Order Plans: Partial Order Plans: Start Start Start Start Start Start Start Right Right Left Left Right Left Left Right Sock Sock Sock Sock Sock Sock Sock Sock Left Left Right Right Right Left Left Sock on Right Sock on Sock Sock Sock Sock Shoe Shoe Left Right Shoe Shoe Right Left Right Left Left Right Shoe Sock Shoe Shoe Sock Sock Left Shoe on Right Shoe on Right Left Right Left Right Left Finish Shoe Shoe Shoe Shoe Shoe Shoe Finish Finish Finish Finish Finish Finish 82
Partial Order Planning ο½ Two initial actions ο½ Start ο½ No precondition ο½ All β Initial State β as its effects ο½ Finish ο½ All β Goal State β as its precondition ο½ No Effect 83
Partial plan definition ο½ Partial plan is a < π΅, π, π > where: ο½ π΅ : set of actions in the plan (plan steps) ο½ Initially {Start, Finish} ο½ π : set of orderings between actions ο½ Initially {Start<Finish} ο½ π : set of causal links ο½ Initially {} 84
Causal links and threats ο½ Causal Link: serve to record the purpose of steps in the plan ο½ Purpose of π΅ π is to achieve the precondition π of π΅ π π π΅ π π΅ π ο½ Threat: causal links are used to detect when a newly introduced action interferes with past decisions. π π΅ π when: ο½ π΅ π threatens π΅ π β’ π΅ π can become between π΅ π and π΅ π ( π βͺ {π΅ π < π΅ π < π΅ π } is consistent) β’ π΅ π has Β¬π as an effect. 85
Resolving Threats ο½ Resolve Threat: ensuring that threats are ordered to come before or after the protected link ο½ Demotion (placed before): add π 3 < π 1 to π ο½ Promotion (placed after): add π 2 < π 3 to π 86
Spare tire example π½πππ’ πππ π πΊπππ’ β§ πππ π ππππ π β§ π΅π’ πΊπππ’, π΅π¦ππ β§ π΅π’ ππππ π, ππ π£ππ π»πππ(π΅π’(ππππ π, π΅π¦ππ)) π΅ππ’πππ(πππππ€π(πππ, πππ), πππΉπ·πππΈ: π΅π’(πππ, πππ), πΉπΊπΊπΉπ·π: Β¬π΅π’(πππ, πππ) β§ π΅π’(πππ, ππ ππ£ππ)) π΅ππ’πππ(ππ£π’ππ(π’, ππ¦ππ), πππΉπ·πππΈ: πππ π(π’) β§ π΅π’(π’, π»π ππ£ππ) β§ ο π΅π’(πΊπππ’, π΅π¦ππ) πΉπΊπΊπΉπ·π: ο π΅π’(π’, π»π ππ£ππ) β§ π΅π’(πΊπππ’, π΅π¦ππ)) π΅ππ’πππ(ππππ€πππ€ππ πππβπ’, πππΉπ·πππΈ: πΉπΊπΊπΉπ·π: ο π΅π’(ππππ π, π»π ππ£ππ) β§ ο π΅π’(ππππ π, ππ π£ππ) β§ ο π΅π’(ππππ π, π΅π¦ππ) β§ ο π΅π’(πΊπππ’, π»π ππ£ππ) β§ ο π΅π’(πΊπππ’, ππ π£ππ) β§ ο π΅π’(πΊπππ’, π΅π¦ππ)) 87
Spare tire example 88
Spare tire example 89
Spare tire example 90
Spare tire example 91
Spare tire example 92
POP ο½ Agenda: open preconditions (along with actions requiring them) ο½ Initially all preconditions of End ο½ function POP(< π΅, π, π >, ππππππ) ο½ π£π ππππππ = {} then return (< π΅, π, π > ) ο½ (π, π΅ ππππ ) β Select a goal from agenda ο½ π β Choose an action that adds π ο½ if no such action then return failure ο½ Update < π΅, π, π > and ππππππ ο½ Add consistent ordering constraints for causal link protection ο½ if no constraint is consistent then return failure ο½ POP(< π΅, π, π >, ππππππ) 93
POP algorithm (more details) POP (< A,O,L >, agenda ) 1. Termination: If agenda is empty return < A,O,L > 2. Goal selection: Let < Q,A need > be a pair on the agenda 3. Action selection: Let A add = choose an action that adds Q if no such action exists, then return failure Q Let L β = L ο {A add β A need } , and let O β = O ο {A add < A need } . If A add is newly instantiated, then A β = A ο {A add } and O β = O ο {A 0 < A add < A ο₯ } (otherwise, let A β = A) 4. Updating of goal set: Let agenda β = agenda -{< Q,A need >}. If A add is newly instantiated, then for each conjunction, Q i, of its precondition, add < Q i ,A add > to agenda β 5. Causal link protection: For every action A t that might p threaten a causal link A p β A c , add a consistent ordering constraint, either (a) Demotion: Add A t < A p to O β (b) Promotion: Add A c < A t to O β If neither constraint is consistent, then return failure 6. Recursive invocation: POP( (< A β ,O β ,L β >, agenda β ) 94
Shopping example ο½ π½πππ’(π΅π’(πΌπππ) ο πππππ‘(πΌππ, πΈπ πππ) ο πππππ‘(ππ, ππππ) ο πππππ‘(ππ, πΆπππππ)) ο½ π»πππ(πΌππ€π(πΈπ πππ) ο πΌππ€π ππππ ο πΌππ€π πΆπππππ ο π΅π’(πΌπππ)) ο½ π΅ππ’πππ(π»π(π’βππ π) ο½ PRECOND: π΅π’(βππ π), ο½ EFFECT: π΅π’(π’βππ π) β§ Β¬π΅π’(βππ π)) π΅ππ’πππ(πΆπ£π§(π¦), ο½ ο½ PRECOND: π΅π’(π‘π’ππ π) ο πππππ‘(π‘π’ππ π, π¦), ο½ EFFECT: πΌππ€π(π¦)) 95
Shopping example ο½ Many possible ways to elaborate the initial plan ο½ Three πΆπ£π§ actions for three preconditions of Finish action ο½ πππππ‘ precondition of Buy ο½ Bold arrows : causal links, protection of precondition ο½ Light arrows : ordering constraints 96
Shopping example 97
Shopping example 98
Shopping example 99
Shopping example 100
Recommend
More recommend