DT-GOLOG Execution Given Start S0, Goal S’, and program δ S0 -1 Add rewards S1 -2 +3 Formulate the problem as an MDP S2 S3 -5 S4 22
DT-GOLOG Execution Given Start S0, Goal S’, and program δ S0 -1 Add rewards S1 -2 +3 Formulate the problem as an MDP S2 S3 -5 ( ∃ b) ¬onTable(b), b ∈ {b1, ..., bn} S4 ¬onTable(b1) ∨ ¬onTable(b2) ∨ ... ∨ ¬onTable(bn) 22
First Order Dynamic Programming [Sanner 07] Resulting MDP can still be intractable. Idea: Logical Structure Abstract Value Function Avoid curse of dimensionality! 23
Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24
Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24
Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24
Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24
Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24
Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? Representation of Reward and Values Adding rewards and values Max Operator Find S 24
Reward and Value Representation b1 Case Representation ∃ b, b ≠ b1, on(b,b1) 10 rCase = ∄ b, b ≠ b1, on(b,b1) 0 25
Add Symbolically A ∧ B 11 A 10 B 1 A ∧ ¬B 12 = ⊕ ¬A ∧ B 21 ¬A 20 ¬B 2 ¬A ∧ ¬B 22 [Scott Sanner - ICAPS08 Tutorial] ⊖ ⊗ Similarly defined for and 26
Max operator max Operator Φ 1 10 10 Φ 1 a_1 ¬ Φ 1 ∧ Φ 2 5 5 Φ 2 = max_a 3 Φ 3 ¬ Φ 1 ∧ ¬ Φ 2 ∧ Φ 3 3 a_2 0 Φ 4 ¬ Φ 1 ∧ ¬ Φ 2 ∧ ¬ Φ 3 ∧ Φ 4 0 [Scott Sanner - ICAPS08 Tutorial] 27
Find s? Isn ʼ t it obvious? 28
Find s? Isn ʼ t it obvious? a s s’ 28
Find s? Isn ʼ t it obvious? a s s’ Dynamic Programming: Given V(s’) find V(s) In MDPs, we have s explicitly. In symbolic representation we have it implicitly so we have to build it. 28
Find s = Goal Regression B b1 Clear(b1) ∧ B ≠ b1 put(A,B) ? b1 ∨ Φ 1=clear(b1) A b1 On(A,b1) regress( Φ 1,a) Weakest relation that ensures Φ 1 after taking a 29
Find s = Goal Regression B b1 Clear(b1) ∧ B ≠ b1 put(A,B) b1 ∨ Φ 1=clear(b1) A b1 On(A,b1) regress( Φ 1,a) Weakest relation that ensures Φ 1 after taking a 29
Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic: � � ? vCase = max rCase ⊕ γ × regr ( vCase, a ) a 30
Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic: � � vCase = max rCase ⊕ γ × regr ( vCase, a ) a 30
Classical Example Box Truck City Goal: Have a box in Paris 10 ∃ b, BoxIn(b,Paris) rCase = else 0 31
Classical Example Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop load and unload have 10% chance of failure Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c) Assumptions: All cities are connected. ϒ = .9 32
Example [Sanner 07] s V*(s) π *(s) 100 noop ∃ b, BoxIn(b,Paris) 89 unload(b,t) else, ∃ b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 80 drive(t,c,paris) else, ∃ b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 72 load(b,t) else, ∃ b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 65 drive(t,c2,c1) else, ∃ b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) else 0 noop 33
Example [Sanner 07] s V*(s) π *(s) 100 noop ∃ b, BoxIn(b,Paris) 89 unload(b,t) else, ∃ b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 80 drive(t,c,paris) else, ∃ b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 72 load(b,t) else, ∃ b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 65 drive(t,c2,c1) else, ∃ b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) else 0 noop ? What did we gain by going through all of this? 33
Conclusion 34
Conclusion Logic Programming Planning Situation Calculus GOLOG 34
Conclusion Logic MDP Programming Review Planning Value Iteration Situation Calculus GOLOG 34
Conclusion Logic MDP Programming Review Planning Value Iteration Situation Calculus GOLOG MDP+ Logic + Programming DT-GOLOG Symbolic DP 34
References Levesque, H.,Reiter, R., Lespérance, Y., Lin, F ., and Scherl, R. “ GOLOG: A Logic Programming Language for Dynamic Domains “, Journal of Logic Programming, 31:59--84, 1997 Richard S. Sutton, Andrew G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, Cambridge, 1998 Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, “ Decision-Theoretic, High-Level Agent Programming in the Situation Calculus ”. AAAI/IAAI 2000: 355-362 S. Sanner, and K. Kersting, ”Symbolic dynamic programming” . Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag, 2007 35
Recommend
More recommend