logic programming and mdps for planning
play

Logic Programming and MDPs for Planning Alborz Geramifard Winter - PowerPoint PPT Presentation

Logic Programming and MDPs for Planning Alborz Geramifard Winter 2009 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Why do we care about


  1. DT-GOLOG Execution Given Start S0, Goal S’, and program δ S0 -1 Add rewards S1 -2 +3 Formulate the problem as an MDP S2 S3 -5 S4 22

  2. DT-GOLOG Execution Given Start S0, Goal S’, and program δ S0 -1 Add rewards S1 -2 +3 Formulate the problem as an MDP S2 S3 -5 ( ∃ b) ¬onTable(b), b ∈ {b1, ..., bn} S4 ¬onTable(b1) ∨ ¬onTable(b2) ∨ ... ∨ ¬onTable(bn) 22

  3. First Order Dynamic Programming [Sanner 07] Resulting MDP can still be intractable. Idea: Logical Structure Abstract Value Function Avoid curse of dimensionality! 23

  4. Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24

  5. Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24

  6. Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24

  7. Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24

  8. Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24

  9. Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? Representation of Reward and Values Adding rewards and values Max Operator Find S 24

  10. Reward and Value Representation b1 Case Representation ∃ b, b ≠ b1, on(b,b1) 10 rCase = ∄ b, b ≠ b1, on(b,b1) 0 25

  11. Add Symbolically A ∧ B 11 A 10 B 1 A ∧ ¬B 12 = ⊕ ¬A ∧ B 21 ¬A 20 ¬B 2 ¬A ∧ ¬B 22 [Scott Sanner - ICAPS08 Tutorial] ⊖ ⊗ Similarly defined for and 26

  12. Max operator max Operator Φ 1 10 10 Φ 1 a_1 ¬ Φ 1 ∧ Φ 2 5 5 Φ 2 = max_a 3 Φ 3 ¬ Φ 1 ∧ ¬ Φ 2 ∧ Φ 3 3 a_2 0 Φ 4 ¬ Φ 1 ∧ ¬ Φ 2 ∧ ¬ Φ 3 ∧ Φ 4 0 [Scott Sanner - ICAPS08 Tutorial] 27

  13. Find s? Isn ʼ t it obvious? 28

  14. Find s? Isn ʼ t it obvious? a s s’ 28

  15. Find s? Isn ʼ t it obvious? a s s’ Dynamic Programming: Given V(s’) find V(s) In MDPs, we have s explicitly. In symbolic representation we have it implicitly so we have to build it. 28

  16. Find s = Goal Regression B b1 Clear(b1) ∧ B ≠ b1 put(A,B) ? b1 ∨ Φ 1=clear(b1) A b1 On(A,b1) regress( Φ 1,a) Weakest relation that ensures Φ 1 after taking a 29

  17. Find s = Goal Regression B b1 Clear(b1) ∧ B ≠ b1 put(A,B) b1 ∨ Φ 1=clear(b1) A b1 On(A,b1) regress( Φ 1,a) Weakest relation that ensures Φ 1 after taking a 29

  18. Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic: � � ? vCase = max rCase ⊕ γ × regr ( vCase, a ) a 30

  19. Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic: � � vCase = max rCase ⊕ γ × regr ( vCase, a ) a 30

  20. Classical Example Box Truck City Goal: Have a box in Paris 10 ∃ b, BoxIn(b,Paris) rCase = else 0 31

  21. Classical Example Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop load and unload have 10% chance of failure Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c) Assumptions: All cities are connected. ϒ = .9 32

  22. Example [Sanner 07] s V*(s) π *(s) 100 noop ∃ b, BoxIn(b,Paris) 89 unload(b,t) else, ∃ b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 80 drive(t,c,paris) else, ∃ b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 72 load(b,t) else, ∃ b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 65 drive(t,c2,c1) else, ∃ b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) else 0 noop 33

  23. Example [Sanner 07] s V*(s) π *(s) 100 noop ∃ b, BoxIn(b,Paris) 89 unload(b,t) else, ∃ b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 80 drive(t,c,paris) else, ∃ b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 72 load(b,t) else, ∃ b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 65 drive(t,c2,c1) else, ∃ b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) else 0 noop ? What did we gain by going through all of this? 33

  24. Conclusion 34

  25. Conclusion Logic Programming Planning Situation Calculus GOLOG 34

  26. Conclusion Logic MDP Programming Review Planning Value Iteration Situation Calculus GOLOG 34

  27. Conclusion Logic MDP Programming Review Planning Value Iteration Situation Calculus GOLOG MDP+ Logic + Programming DT-GOLOG Symbolic DP 34

  28. References Levesque, H.,Reiter, R., Lespérance, Y., Lin, F ., and Scherl, R. “ GOLOG: A Logic Programming Language for Dynamic Domains “, Journal of Logic Programming, 31:59--84, 1997 Richard S. Sutton, Andrew G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, Cambridge, 1998 Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, “ Decision-Theoretic, High-Level Agent Programming in the Situation Calculus ”. AAAI/IAAI 2000: 355-362 S. Sanner, and K. Kersting, ”Symbolic dynamic programming” . Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag, 2007 35

Recommend


More recommend