Logic Programming and MDPs for Planning Alborz Geramifard Winter - PowerPoint PPT Presentation

DT-GOLOG Execution Given Start S0, Goal S’, and program δ S0 -1 Add rewards S1 -2 +3 Formulate the problem as an MDP S2 S3 -5 S4 22

DT-GOLOG Execution Given Start S0, Goal S’, and program δ S0 -1 Add rewards S1 -2 +3 Formulate the problem as an MDP S2 S3 -5 ( ∃ b) ¬onTable(b), b ∈ {b1, ..., bn} S4 ¬onTable(b1) ∨ ¬onTable(b2) ∨ ... ∨ ¬onTable(bn) 22

First Order Dynamic Programming [Sanner 07] Resulting MDP can still be intractable. Idea: Logical Structure Abstract Value Function Avoid curse of dimensionality! 23

Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? 24

Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic ? Representation of Reward and Values Adding rewards and values Max Operator Find S 24

Reward and Value Representation b1 Case Representation ∃ b, b ≠ b1, on(b,b1) 10 rCase = ∄ b, b ≠ b1, on(b,b1) 0 25

Add Symbolically A ∧ B 11 A 10 B 1 A ∧ ¬B 12 = ⊕ ¬A ∧ B 21 ¬A 20 ¬B 2 ¬A ∧ ¬B 22 [Scott Sanner - ICAPS08 Tutorial] ⊖ ⊗ Similarly defined for and 26

Max operator max Operator Φ 1 10 10 Φ 1 a_1 ¬ Φ 1 ∧ Φ 2 5 5 Φ 2 = max_a 3 Φ 3 ¬ Φ 1 ∧ ¬ Φ 2 ∧ Φ 3 3 a_2 0 Φ 4 ¬ Φ 1 ∧ ¬ Φ 2 ∧ ¬ Φ 3 ∧ Φ 4 0 [Scott Sanner - ICAPS08 Tutorial] 27

Find s? Isn ʼ t it obvious? 28

Find s? Isn ʼ t it obvious? a s s’ 28

Find s? Isn ʼ t it obvious? a s s’ Dynamic Programming: Given V(s’) find V(s) In MDPs, we have s explicitly. In symbolic representation we have it implicitly so we have to build it. 28

Find s = Goal Regression B b1 Clear(b1) ∧ B ≠ b1 put(A,B) ? b1 ∨ Φ 1=clear(b1) A b1 On(A,b1) regress( Φ 1,a) Weakest relation that ensures Φ 1 after taking a 29

Find s = Goal Regression B b1 Clear(b1) ∧ B ≠ b1 put(A,B) b1 ∨ Φ 1=clear(b1) A b1 On(A,b1) regress( Φ 1,a) Weakest relation that ensures Φ 1 after taking a 29

Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic: � � ? vCase = max rCase ⊕ γ × regr ( vCase, a ) a 30

Symbolic Dynamic Programming (Deterministic) Tabular: � � r + γ V ( s ′ ) V ( s ) = max a Symbolic: � � vCase = max rCase ⊕ γ × regr ( vCase, a ) a 30

Classical Example Box Truck City Goal: Have a box in Paris 10 ∃ b, BoxIn(b,Paris) rCase = else 0 31

Classical Example Actions: drive(t,c1,c2), load(b,t), unload(b,t), noop load and unload have 10% chance of failure Fluents: BoxIn(b,c), BoxOn(b,t), TruckIn(t,c) Assumptions: All cities are connected. ϒ = .9 32

Example [Sanner 07] s V*(s) π *(s) 100 noop ∃ b, BoxIn(b,Paris) 89 unload(b,t) else, ∃ b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 80 drive(t,c,paris) else, ∃ b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 72 load(b,t) else, ∃ b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 65 drive(t,c2,c1) else, ∃ b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) else 0 noop 33

Example [Sanner 07] s V*(s) π *(s) 100 noop ∃ b, BoxIn(b,Paris) 89 unload(b,t) else, ∃ b,t TruckIn(t,Paris) ∧ BoxOn(b,t) 80 drive(t,c,paris) else, ∃ b,c,t BoxOn(b,t) ∧ TruckIn(t,c) 72 load(b,t) else, ∃ b,c,t BoxIn(b,c) ∧ TruckIn(t,c) 65 drive(t,c2,c1) else, ∃ b,c1,c2,t BoxIn(b,c1) ∧ TruckIn(t,c2) else 0 noop ? What did we gain by going through all of this? 33

Conclusion 34

Conclusion Logic Programming Planning Situation Calculus GOLOG 34

Conclusion Logic MDP Programming Review Planning Value Iteration Situation Calculus GOLOG 34

Conclusion Logic MDP Programming Review Planning Value Iteration Situation Calculus GOLOG MDP+ Logic + Programming DT-GOLOG Symbolic DP 34

References Levesque, H.,Reiter, R., Lespérance, Y., Lin, F ., and Scherl, R. “ GOLOG: A Logic Programming Language for Dynamic Domains “, Journal of Logic Programming, 31:59--84, 1997 Richard S. Sutton, Andrew G. Barto, “Reinforcement Learning: An Introduction”, MIT Press, Cambridge, 1998 Craig Boutilier, Raymond Reiter, Mikhail Soutchanski, Sebastian Thrun, “ Decision-Theoretic, High-Level Agent Programming in the Situation Calculus ”. AAAI/IAAI 2000: 355-362 S. Sanner, and K. Kersting, ”Symbolic dynamic programming” . Chapter to appear in C. Sammut, editor, Encyclopedia of Machine Learning, Springer-Verlag, 2007 35

Logic Programming and MDPs for Planning Alborz Geramifard Winter - PowerPoint PPT Presentation

Logic Programming and MDPs for Planning Alborz Geramifard Winter 2009 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Why do we care about

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik

CS 730/830: Intro AI Solving MDPs MDP Extras Wheeler Ruml (UNH) Lecture 20, CS 730 1 / 23

Planning and Optimization G1. Factored MDPs Malte Helmert and Thomas Keller Universit at

Computational Logic A Motivational Introduction 1 Computational Logic programming algorithms

Markov Logic Markov Logic Probability First-Order Logic Propositional Logic Markov Logic

Lecture 2: Infinite Horizon and Indefinite Horizon MDPs B9140 Dynamic Programming &

B5.1 Introduction Heuristics Constraints Planning Explicit MDPs Probabilistic Factored MDPs

B6.1 Introduction Heuristics Constraints Planning Explicit MDPs Probabilistic Factored MDPs

Planning with MDPs (Markov Decision Processes) H ector Geffner ICREA and Universitat Pompeu

Policy Gradients for CVaR-Constrained MDPs Prashanth L.A. INRIA Lille Team SequeL Prashanth

CS 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley

Partially-Observable MDPs RN, Chapter 17.4 17.5 Decision Theoretic Agents Introduction

Online Convex Optimization in Adversarial MDPs Aviv Rosenberg Yishay Mansour Motivation:

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1

Formal Language Techniques for Space Lower Bounds Philipp Kuinke February 23, 2018 Contained in

A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based Decoding with a Fixed Distortion

Visual Perception for Autonomous Driving on the NVIDIA DrivePX2 and using SYNTHIA Dr. Juan C.

Simple but Effective Tree Structures for Dynamic Programming-Based Stereo Matching Michael Bleyer

Smart Recursion aka Dynamic Programming Suresh Velagapudi 31 Jan 2015 licensed under a Creative

- Given a set of k colors, color each node (randomly). - With high probability, there is a

Pr t sss

Towards a Computer Algebra System with Automatic Differentiation for use with object-oriented

Logic Programming and MDPs for Planning Alborz Geramifard Winter - PowerPoint PPT Presentation

Logic Programming and MDPs for Planning Alborz Geramifard Winter 2009 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Index Introduction Logic MDP Programming MDP+ Logic + Programming 2 Why do we care about

Planning and Optimization December 4, 2019 G1. Factored MDPs G1.1 Factored MDPs Planning and

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

Parity Objectives in Countable MDPs Stefan Kiefer Richard Mayr Mahsa Shirmohammadi Dominik

CS 730/830: Intro AI Solving MDPs MDP Extras Wheeler Ruml (UNH) Lecture 20, CS 730 1 / 23

Planning and Optimization G1. Factored MDPs Malte Helmert and Thomas Keller Universit at

Computational Logic A Motivational Introduction 1 Computational Logic programming algorithms

Markov Logic Markov Logic Probability First-Order Logic Propositional Logic Markov Logic

Lecture 2: Infinite Horizon and Indefinite Horizon MDPs B9140 Dynamic Programming &amp;

B5.1 Introduction Heuristics Constraints Planning Explicit MDPs Probabilistic Factored MDPs

B6.1 Introduction Heuristics Constraints Planning Explicit MDPs Probabilistic Factored MDPs

Planning with MDPs (Markov Decision Processes) H ector Geffner ICREA and Universitat Pompeu

Policy Gradients for CVaR-Constrained MDPs Prashanth L.A. INRIA Lille Team SequeL Prashanth

CS 188: Artificial Intelligence Markov Decision Processes (MDPs) Pieter Abbeel UC Berkeley

Partially-Observable MDPs RN, Chapter 17.4 17.5 Decision Theoretic Agents Introduction

Online Convex Optimization in Adversarial MDPs Aviv Rosenberg Yishay Mansour Motivation:

Computational Approaches for Stochastic Shortest Path on Succinct MDPs Krishnendu Chatterjee 1

Formal Language Techniques for Space Lower Bounds Philipp Kuinke February 23, 2018 Contained in

A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based Decoding with a Fixed Distortion

Visual Perception for Autonomous Driving on the NVIDIA DrivePX2 and using SYNTHIA Dr. Juan C.

Simple but Effective Tree Structures for Dynamic Programming-Based Stereo Matching Michael Bleyer

Smart Recursion aka Dynamic Programming Suresh Velagapudi 31 Jan 2015 licensed under a Creative

- Given a set of k colors, color each node (randomly). - With high probability, there is a

Pr t sss

Towards a Computer Algebra System with Automatic Differentiation for use with object-oriented

Lecture 2: Infinite Horizon and Indefinite Horizon MDPs B9140 Dynamic Programming &