Integrating decision-theoretic planning and programming for robot control in highly dynamic domains Christian Fritz Thesis, Final Presentation Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.1/32
Introduction Goals: ◮ combine: � programming � decision-theoretic planning � on-line! ◮ extend planning with options ◮ evaluate in three diversified example domains � grid world � RoboCup Simulation � RoboCup Mid-Size Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.2/32
Programming ICP GOLOG ◮ based on situation calculus ◮ extends basic GOLOG: + on-line: incremental, sensing (active and passive) + continuous change + concurrency + progression + probabilistic projection – nondeterminism ◮ problems: � decision making: explicit, missing utility theory � projection comparatively slow Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.3/32
Decision-Theoretic Planning Markov Decision Processes (MDPs) standard model for decision-theoretic planning problems ◮ Formally: M = < S, A, T, R > , with � S a set of states � A a set of actions � T : S × A × S → [0 , 1] a transition function � R : S → I R a reward function ◮ Here: fully observable MDPs ◮ Planning task: find an optimal policy, maximizing expected reward ◮ Note: S and A are usually finite! Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.4/32
Programming & Planning: DTGolog ◮ New Golog derivative DTG OLOG [Boutilier et al.] ◮ Combines explicit agent programming with planning ◮ Uses MDPs to model the planning problem: � S = situations � A = primitive actions � T = for each action a ∈ A , a list of outcomes and their respective probability � R : situations → I R ◮ applies decision-tree search to solve MDP up to a given horizon Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.5/32
Programming & Planning: DTGolog Disadvantages: ◮ offline ◮ situations = states � infinite state space � inefficient Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.6/32
R EADY L OG Contributions: ◮ re-added nondeterminism with decision-theoretic semantics → on-line decision-theoretic Golog ◮ added options to speed up MDP solution ◮ preprocessor to minimize interpretation on-line Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.7/32
Part I Extending DTGolog with Options Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.8/32
Options? what’s that? Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.9/32
Options Idea: ◮ construct complex actions from primitive ones ◮ options: solutions to sub-MDPs ◮ generate models about them: � when possible to execute? � which outcomes possible to occur? � which probabilities do the outcomes have? � expected rewards and costs? ( expected value ) ◮ these can then be used in planning Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.10/32
Integrating Options into Golog how do we integrate options into DTGolog/ReadyLog? ◮ avoiding the inconvenience “situations = states” ◮ instead mappings: � situations → states (when ’entering’ option) � states → situations (when ’leaving’ option) ◮ options.. � ..are solutions to local MDPs.. � ..encapsulated into a stochastic procedure. ◮ stochastic procedures.. � ..are procedures with an explicit model (preconditions/effects/costs); � ..replace stochastic actions; � ..can model options. Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.11/32
Generating Options how do we generate options? ◮ defi ne: � φ precondition (think: states where option is applicable) � β : exitstates → value pseudo-rewards for local MDP � θ option-skeleton one-step program to take in each step.. • ..usually something like nondet( ) ; [left, right, down, up] • ..can contain ifs; • ..can build on options/stochastic procedures � and: two mappings: • Φ : s ituations → s tates • Σ : s tates → s ituations • option _ mapping ( o, σ, Γ , ϕ ) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.12/32
Examples ◮ example policy: proc (room1_2 , [exogf_Update , while (is_possible (room1_2 ), [ if (pos=[0, 0], go_right , if (pos=[0, 1], go_right , if (pos=[0, 2], go_up , if (pos =[1, 0], go_right , if (pos=[1, 1], go_right , if (pos=[1, 2], go_right , if (pos=[2, 0], go_down , if (pos=[2, 1], go_right , if (pos=[2, 2], go_up , []))))))))), exogf_Update ])]). ◮ example model (for state ’position=(0,0)’): opt_costs (room1_2, [(pos, [0, 0])], 4.51650594972207). opt_probability_list (room1_2 , [(pos , [0, 0])], [([(pos , [1, 3])], 0.00012), ([(pos , [3, 1])], 0.99987)]). Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.13/32
Test Setting 1 2 S G3 G4 3 4 5 6 7 G11 Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.14/32
Experimental Results (a) full MDP (b) heuristics (c) options (C) planning (A) (B) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.15/32
Experimental Results 9555.86 2507.56 702.33 302.01 53.6 38.63 seconds 11.23 6.81 3.66 1.04 A 0.55 A’ B 0.1 B’ 0.048 0.03 C 3 4 5 6 7 8 9 10 11 manhattan distance from start to goal Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.16/32
Part II On-line Decision-Theoretic Golog for Unpredictable Domains Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.17/32
R EADY L OG : on-line DT planning on-line: ◮ incremental � solve( plan-skeleton , horizon) � execute returned policy ◮ sensing / exogenous events � problem: • dynamic environment (changes while thinking) • imperfect models → policy can get invalid ⇒ execution monitoring : • program and policy coexistence • markers Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.18/32
Execution Monitoring Semantics Trans ( solve ( p, h ) , s, δ ′ , s ′ ) ≡ ∃ π, v, pr . BestDo ( p, s, h, π, v, pr ) ∧ δ ′ = applyPol ( π ) ∧ s ′ = s. BestDo ( if ( ϕ, p 1 , p 2 ); p , s, h, π, v, pr ) = . ϕ [ s ] ∧ ∃ π 1 . BestDo ( p 1 ; p, s, h, π 1 , pr ) ∧ π = M ( ϕ, true ); π 1 ∨ ¬ ϕ [ s ] ∧ ∃ π 2 . BestDo ( p 2 ; p, s, h, π 2 , v, pr ) ∧ π = M ( ϕ, false ); π 2 Trans ( applyPol ( M ( ϕ, v ); π ) , s, δ ′ , s ′ ) ≡ s = s ′ ∧ ( v = true ∧ ϕ [ s ] ∧ δ ′ = applyPol ( π ) ∨ v = false ∧ ¬ ϕ [ s ] ∧ δ ′ = applyPol ( π ) ∨ v = true ∧ ¬ ϕ [ s ] ∧ δ ′ = nil ∨ v = false ∧ ϕ [ s ] ∧ δ ′ = nil ) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.19/32
R EADY L OG II ◮ options (..) ◮ preprocessor: � translates R EADY L OG functions, conditions, defi nitions.. to Prolog code � creates successor state axioms from effect axioms � speed-up of about factor 16 effect axioms (uncompiled) 1024 successor state axioms (compiled) 256 64 seconds 16 4 1 200 400 600 800 1000 1200 1400 1600 1800 2000 length of situation term Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.20/32
Experimental Results Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.21/32
Experimental Results: SimLeague ◮ compared with ICP GOLOG(Normans results) planning time in seconds ICP GOLOG R EADY L OG goal shot 0.35 0.01 direct pass 0.25 0.01 ◮ speed-up due to preprocessor Example where these are combined (demo): solve ( nondet ([goalKick (OwnNumber ), [ pickBest (bestP , [2..11], [directPass (OwnNumber , bestP, pass_NORMAL ), goalKick (bestP )])] ]), Horizon ) Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.22/32
Experimental Results: MidSize Integrating decision-theoretic planning and programming for robot control in highly dynamic domains – p.23/32
Recommend
More recommend