Planning under uncertainty as Golog Programs Jorge Baier ∗ ∗ Co-work with Javier Pinto
Outline • Objectives, contributions and motivation. • The Probabilistic Situation Calculus. • An extension to Golog. • An algorithm for planning with conditional Golog plans. • Loop induction. • Conculsions.
Objectives and Contributions • Main objective of the work: use the Probabilistic Situation Calculus (PSC) to model and program agents in domains with uncertainty. • Application: program execution and planning under uncertainty. • Contributions: – An (offline) extension to Golog to handle non-determinism in the effects of actions ` a la PSC. – An algorithm for planning under uncertainty for fully observable worlds. – The algorithm may generate plans with loops.
Why conditional planning? • For efficiency reasons, one of the fundamental ideas of cognitive robotics is to program agents instead of letting them plan. • Nevertheless, planning is still necessary; we don’t want the programmer to “think about everything”. We want agents to be flexible. • In the context of uncertainty, conditional plans must be generated since there might be different contingencies under wich different courses of action might be chosen.
The Probabilistic Situation Calculus • The Probabilistic Situation Calculus [PSSM00] is a many-sorted second- order (first-order + induction) family of logical languages. It is an exten- sion of the standard Situation Calculus that handles uncertainty effects of actions with (discrete) probability distributions. • Language elements: – Actions . An action is a pair � i, e � , Where i is the deterministic part ( input ), and e is the non-deterministic, “by-nature” part ( outcome ). There are sorts I and E for inputs and outcomes. Example: If input action toss a coin is denoted by Toss then � Toss, Tails � and � Toss, Heads � , are two standard SC actions.
The Probabilistic Situation Calculus – Situations . The same as in the standard SC. Input Outcome do ( � T oss, T ails � , S 0 ) T ails T oss S 0 Heads do ( � T oss, Heads � , S 0 )
The Probabilistic Situation Calculus – Fluents . Treated as objects of sort F . – Distinguished predicate holds ⊆ F × S , that can be extended nat- urally to support fluent formulas. – Distinguished predicates Poss i ⊆ I ×S and Poss ⊆ A×S state preconditions of inputs and actions. Example: Poss i ( drop ( x ) , s ) ≡ holds ( holding ( x ) ,s ) Poss ( � drop ( x ) , e � , s ) ≡ Poss i ( drop ( x ) , s ) ∧ ( e = tails ( x ) ∨ e = heads ( x )) – In our version, a distinguished function Outcome : I × E × S → [0 , 1] , that assigns a multinomial probability distribution on out- comes of actions. Example: ¬ holds ( biased ( x ) , s ) ⊃ Outcome ( drop ( x ) , tails ( x ) , s ) = 0 . 5 holds ( biased ( x ) , s ) ⊃ Outcome ( drop ( x ) , heads ( x ) , s ) = 0 . 8
Theories of action in Probabilistic Situation Calculus As usual, a theory of action in the Probabilistic Situation Calculus is com- posed by foundational axioms. Among them there are aditional for handling probabilities, � ( ∀ i, s ) . Poss i ( i, s ) ⊃ Outcome ( i, e, s ) = 1 . e ∈E and axioms for describing: • the intitial situation • preconditions for actions • successor state axioms. Example: Poss ( a, s ) ⊃ [ holds ( headsUp ( x ) , do ( a, s )) ≡ out ( a ) = heads ( x ) ∨ ( holds ( headsUp ( x ) , s ) ∧ ¬ out ( a ) = tails ( x ))] , • probability distribution and unique name axioms
Computing probabilities for U-G OLOG programs • We extended a subset of C ON G OLOG with non-deterministic effects of actions. Features: – Primitive actions in a U-G OLOG program are inputs, not standard SC actions. – The execution of an input in a U-G OLOG program may result in multiple situations. • Here is the main change to C ON G OLOG ’s semantics: Trans ( α, s, δ, s ′ ) ≡ α � = NoOp ∧ δ = {}∧ ( ∃ e ) .Poss ( � α, e � , s ) ∧ s ′ = do ( � α, e � , s ) .
Computing probabilities for U-G OLOG programs • We also define the function ProbS G ( α, s, s ′ ) , which is the probability that after executing α in s , the program ends in situation s ′ . Some of the axioms for ProbS G are the following. ProbS G ( α, s, s ′ ) = Outcome ( α, e, s ) ≡ Trans ( α, s, do ( � α, e � , s ) ProbS G (( σ 1 ; σ 2 ) , s, s ′ ) = ProbS G ( σ 1 , s, s ′′ ) × ProbS G ( σ 2 , s ′′ , s ′ ) ≡ Do ( σ 1 , s, s ′′ ) ∧ Do ( σ 2 , s ′′ , s ′ ) P robSG ( σ 1 , s, s ′ ) ( iff holds ( φ, s ) P robSG ( if φ then σ 1 else σ 2 endIf , s, s ′ ) = P robSG ( σ 2 , s, s ′ ) otherwise iff ¬ holds ( φ, s ) ∧ s = s ′ ( 1 P robSG ( while φ do σ , s, s ′ ) = P robSG ( σ ; while φ do σ , s, s ′ ) otherwise
Computing probabilities for U-G OLOG programs • We also define the predicate Prob G such that Prob G ( g, σ, s ) is the prob- ability that fluent (fluent formula) g holds after executing program σ in s . � ProbS G ( σ, s, s ′ ) × holds ( g, s ′ ) . Prob G ( g, σ, s ) = s ′ ∈{ s ′′ | Do ( σ,s,s ′′ ) } where holds ( g, s ′ ) is 1 when holds ( g, s ′ ) is true and 0 otherwise.
Planning under uncertainty • In classical planning, it is assumed that actions have deterministic effects. Solution: linear plans (simple sequences of action). Example: (World of coins) In this domains, there are two coins, C 1 and C 2 that are initially over a table with tails up. An agent can drop and grab coins. The drop coin action has non-deterministic effects. • In the PCS we model this world using – Input: grab ( x ) , drop ( x ) . – Outcomes: grab ( x ) , heads ( x ) , tails ( x ) . – Fluents: headsUp ( x ) , tailsUp ( x ) , onTable ( x ) , onFloor ( x ) • Initial conditions: two coins over a table, C 1 and C 2 .
Conditional plans through refinement • Suppose we want a plan that with probability at least 0.7 achieves the def following goal: have coin C 1 with heads up and on the floor G = headsUp ( C 1 ) ∧ onFlooe ( C 1 ) . • Consider the following plan: def = grab ( C 1 ); δ 1 drop ( C 1 ) • This program achieves the goal with a positive probability, but not with the required 0.7. • It is not hard to see that no linear sequence of action achieves the goal with the required probability. • Solution : Conditional plans.
A more general case • Let i 1 ; i 2 ; i 3 ; i 4 be an arbitrary sequence of inputs that achieves G with a positive probability. Consider that the following is the tree of situations that result from its execution. S 2 S 3 S 0 S 1 S 4 i 1 i 2 i 3 i 4 : Bad situation : Good situation • Our algorithm for conditional planning starts with an arbitrary sequence
A more general case of actions that achieves the goal with a positive probability threshold (given as parameter) and then refines it. • The algorithm starts simulating the program until it finds one or more bad situations. Once a bad situation is found, an if-then-else construct is introduced to the program. • A situation is bad for a goal G and plan σ in situation S if Prob G ( G, σ, S ) = 0 . • Intuitively, refinement of the plan corresponds to the following program. i 1 ; i 2 ; if in situation S 4 then new plan for G in S 4 3 ; i 4 for else recursive refi nement of i the rest of current situations • The condition “in situation S 2 ” cannot be included directly in the pro- gram.
A more general case • The algorithm finds a discriminating fluent (true in good situations false in bad). • For a theory of actions for the world of coins, the plan returned in FinalPlan by the execution of CRefine ( headsUp ( C 1 ) , {} , FinalPlan, { S 0 } , 0 . 4 , 0 , 2) is grab ( C 1 ); drop ( C 1 ); if headsUp ( C 1 ) then NoOp ; else grab ( C 1 ); drop ( C 1 ); if headsUp ( C 1 ) then NoOp ; else grab ( C 1 ); drop ( C 1 ); endIf endIf which achieves the goal with probability 0.875.
CRefine Operator CRefine ( Goal, CandP lan, F inalP lan, CurSits, T, Level, T op ) ← BadSits = { s | s ∈ CurSits ∧ Bad ( s, Goal, CandP lan ) } if BadSits = {} then if CandP lan = {} then F inalP lan = NoOp else if ( ∃ α, σ ) CandP lan = α ; σ then NewSits = { s | ( ∃ s ′ ) s ′ ∈ CurSits ∧ Do ( α, s ′ , s ) } CRefine ( Goal, σ, σ ′ , NewSits, T, Level, T op ) F inalP lan = α ; σ ′ endIf else GoodSits = CurSits − BadSits F irstBad = an element of BadSits FindSeq ( Goal, CandP lanF orBads, F irstBad, T ) if Level < T op then CRefine ( Goal, CandP lanF orBads, P lanF orBads, BadSits, T, Level + 1 , T op ) else P lanF orBads = CandP lanF orBads endIf if GoodSits = {} then F inalP lan = P lanF orBads else P roperty = fl uent literal l | ( ∀ s ) s ∈ GoodSits ⊃ holds ( l, s ) ∧ ( ∀ s ) s ∈ BadSits ⊃ ¬ holds ( l, s ) CRefine ( Goal, CandP lan, P lanF orGoods, GoodSits, T, Level + 1 , T op ) F inalP lan = if P roperty then P lanF orGoods else P lanF orBads end Figure 1: Pseudo-prolog code for a simple algorithm for planning under uncer- tainty with complete knowledge
Loop induction • If CRefine is invoked on the same arguments but replacing the depth level by 3 we obtain the following program: grab ( C 1 ); drop ( C 1 ); if headsUp ( C 1 ) then NoOp ; else grab ( C 1 ); drop ( C 1 ); if headsUp ( C 1 ) then NoOp ; else grab ( C 1 ); drop ( C 1 ); if headsUp ( C 1 ) then NoOp ; else grab ( C 1 ); drop ( C 1 ); endIf endIf endIf , which achieves the goal with probability 0.9375. • This suggests that loops could be induced when repeated sequences of if-then-else conditionals appear involving the same body...
Recommend
More recommend