practical linear value value practical linear
play

Practical Linear- -value value Practical Linear Approximation - PowerPoint PPT Presentation

Practical Linear- -value value Practical Linear Approximation Techniques Approximation Techniques for First- -order order MDPs MDPs for First & Craig Sanner & Scott Sanner Craig Boutilier Boutilier Scott University of Toronto


  1. Practical Linear- -value value Practical Linear Approximation Techniques Approximation Techniques for First- -order order MDPs MDPs for First & Craig Sanner & Scott Sanner Craig Boutilier Boutilier Scott University of Toronto University of Toronto UAI 2006 UAI 2006

  2. Why Solve First- -order order MDPs MDPs? ? Why Solve First � Relational Relational desc desc. of ( . of (prob prob) planning domain in (P)PDDL: ) planning domain in (P)PDDL: � Paris Paris Box World: Box World: Moscow Moscow London London Berlin Berlin Rome Rome (:action :action load load- -box box- -on on- -truck truck- -in in- -city city ( :parameters (?b :parameters (?b - - box ?t box ?t - - truck ?c truck ?c - - city) city) :precondition (and ( (and (BIn BIn ?b ?c) ( ?b ?c) (TIn TIn ?t ?c)) ?t ?c)) :precondition :effect (and (On ?b ?t) (not ( (and (On ?b ?t) (not (BIn BIn ?b ?c))) ?b ?c))) :effect � Can solve a Can solve a ground MDP ground MDP for for each each domain instantiation: domain instantiation: � � 3 trucks: 2 planes: 4 boxes: 3 trucks: 2 planes: 4 boxes: � � Or solve Or solve first first- -order MDP order MDP for for all all domain inst. at once! domain inst. at once! � � Lift PPDDL MDP specification to first Lift PPDDL MDP specification to first- -order (FOMDP) order (FOMDP) � � Soln Soln makes value distinctions for makes value distinctions for all all dom. instantiations! dom. instantiations! � 2

  3. Background / Talk Outline Background / Talk Outline 1) Symbolic DP for first Symbolic DP for first- -order order MDPs MDPs (BRP, 2001) (BRP, 2001) 1) Defines FOMDP / operators / value iteration Defines FOMDP / operators / value iteration � � � Requires FO simplification for compactness � Requires FO simplification for compactness � � 2) First First- -order approx. linear order approx. linear prog prog. (SB, 2005) . (SB, 2005) 2) Approximate value with linear comb. of basis funs. Approximate value with linear comb. of basis funs. � � ☺ project onto weight space ☺ No simplification → → project onto weight space No simplification � � 3) Many practical questions remaining (SB, 2006) Many practical questions remaining (SB, 2006) 3) Other algorithms – – first first- -order API? order API? Other algorithms � � Where do basis functions come from? Where do basis functions come from? � � How to efficiently handle universal rewards? How to efficiently handle universal rewards? � � Optimizations for scalability? Optimizations for scalability? � � 3

  4. FOMDP Foundation: SitCalc SitCalc FOMDP Foundation: loadS(b,t), (b,t), unloadS unloadS(b,t), … (b,t), … Deterministic Actions: loadS � Deterministic Actions: � Situations: S S 0 , do(loadS loadS(b,t), S (b,t), S 0 ), … 0 , do( 0 ), … � Situations: � : BIn BIn(b,c,s), (b,c,s), TIn TIn(t,c,s), On(b,t,s) (t,c,s), On(b,t,s) � Fluents Fluents: � F : each fluent F � Successor Successor- -state axioms ( state axioms (SSAs SSAs) ) for for each fluent : � (like det det. FO . FO- -DBN) DBN) � Describe how action affects fluent Describe how action affects fluent (like � BIn(b,c,do(a,s)) (b,c,do(a,s)) ≡ Ex: BIn � Ex: ≡ � (1) Bin(b,c,s) AND (1) Bin(b,c,s) AND a a g g loadS loadS(b,t) (b,t) OR (2) (2) for some for some t t : : a a = = unloadS unloadS(b,t) (b,t) AND AND TIn TIn(t,c,s) (t,c,s) OR ) = = ϕ Regression Operator: Regr Regr( ( ϕ ’ ϕ ) ϕ ’ � Regression Operator: � � Takes a formula Takes a formula ϕ ϕ describing a describing a post post- -action action state state � ’ describing ϕ ’ � Uses Uses SSAs SSAs to build to build ϕ describing pre pre- -action action state state � � Crucial for backing up value fun to produce Q Crucial for backing up value fun to produce Q- -fun! fun! � 4

  5. FOMDP Case Representation FOMDP Case Representation Assign value to first- -order state abstraction order state abstraction Case: Assign value to first � Case: � E.g., can express reward in BoxWorld BoxWorld FOMDP as… � E.g., can express reward in FOMDP as… � 1 1 b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s) ∀ b,c. ∀ rCase(s) (s) = = rCase 0 0 b,c. Dest Dest(b,c) (b,c) ⇒ ⇒ BIn BIn(b,c,s) (b,c,s) ∀ b,c. ¬ ∀ ¬ Define unary, binary case operations Operators: Define unary, binary case operations � Operators: � / (or 1 , 0 ) sum” / (or 1 , 0 � E.g., can take “cross E.g., can take “cross- -sum” ) of two cases of two cases… … � 13 1 3 ∃ x.A(x) x.A(x) ∧ ∃ y.A(y) y.A(y) ∧ ∧ B(y) B(y) ∃ ∧ ∃ 3 = 10 10 3 x.A(x) ∃ x.A(x) ∃ y.A(y) y.A(y) ∧ ∧ B(y) B(y) 14 4 1 ∃ x.A(x) x.A(x) ∧ ∃ y.A(y) y.A(y) ∧ ∧ B(y) B(y) ∃ ∃ = ∧ ¬ ¬∃ ∃ / / 20 4 20 4 x.A(x) y.A(y) ∧ B(y) ∃ x.A(x) ∃ y.A(y) ∧ B(y) 23 3 2 x.A(x) ∧ y.A(y) ∧ B(y) ¬∃ ¬∃ ¬∃ x.A(x) ∃ y.A(y) ∧ B(y) ¬ ¬ ∧ ∃ ¬∃ 24 4 2 x.A(x) ∧ y.A(y) ∧ B(y) ¬∃ x.A(x) ∃ y.A(y) ∧ B(y) ∧ ¬ ¬∃ ¬∃ � Must remove inconsistent elements (i.e., red bar ) Must remove inconsistent elements (i.e., red bar ) � 5

  6. FOMDP Actions and FODTR FOMDP Actions and FODTR � SitCalc SitCalc is deterministic, how to handle probabilities? is deterministic, how to handle probabilities? � User’s stochastic actions: load(b,t) load(b,t) � User’s stochastic actions: � Nature’s deterministic choice: loadS loadS(b,t) (b,t), , loadF loadF(b,t) (b,t) � Nature’s deterministic choice: � � Probability distribution over Nature’s choice: Probability distribution over Nature’s choice: � snow (s) .1 snow .1 (s) P(loadS loadS(b,t) (b,t) | load(b,t)) = | load(b,t)) = P( ¬ snow snow (s) .5 .5 (s) ¬ 0 P( | load(b,t)) = 1 0 P(loadF loadF(b,t) (b,t) | load(b,t)) = 1 P(loadS loadS(b,t) (b,t) | load(b,t)) | load(b,t)) P( � First First- -order decision order decision- -theoretic regression (FODTR): theoretic regression (FODTR): � Given value fun vCase vCase(s) (s) and user action, produces � Given value fun and user action, produces � first- -order description of “Q order description of “Q- -fun” (modulo reward) fun” (modulo reward) first “Q- -Fun” = Fun” = FODTR[ FODTR[ vCase vCase(s), load(b,t) ] = (s), load(b,t) ] = “Q Regr[ Regr [ vCase vCase( after ( after loadS loadS… … ) ] ) ] P( loadS P( loadS… … | load… ) | load… ) 1 1 Regr[ Regr [ vCase vCase( after ( after loadF loadF… … ) ] ) ] P( loadF P( loadF… … | load… ) | load… ) 1 1 / / 6

  7. FOMDP Backup Operators FOMDP Backup Operators In fact, there are 3 types of “Q- -funs”/backup operators: funs”/backup operators: In fact, there are 3 types of “Q 1) B A( [vCase vCase(s) (s)] = ] = rCase rCase(s) (s) / FODTR[vCase vCase(s) (s)] ] 1) B ) [ γ⋅ FODTR[ A( x x ) / γ⋅ .9 .9 (b,t) ϕ (b,t) Think of as Q(A(x),s) Q(A(x),s), , ϕ Think of as Let B Let B load (b,t) [ [vCase vCase(s)] (s)] = = load(b,t) 0 0 (b,t) ϕ (b,t) note the free vars vars! ! ¬ ϕ note the free ¬ 2) B A [vCase vCase(s) (s)] = ] = ∃ . B B A( [vCase vCase(s) (s)] ] (action abstraction!) 2) B A [ ) [ A( x x ) ∃ x x . (action abstraction!) .9 .9 ∃ b,t b,t. . ϕ ϕ (b,t) (b,t) Think of as ~Q(A,s) ~Q(A,s), , no ∃ no Think of as B load [vCase vCase(s)] (s)] = = B load [ 0 0 b,t. . ¬ (b,t) free vars vars but now overlap! but now overlap! ∃ b,t ϕ (b,t) free ¬ ϕ ∃ 3) B A [vCase vCase(s) (s)] = max( B ] = max( B A [vCase vCase(s) (s)] ) ] ) 3) B max [ A [ A max .9 .9 ∃ b,t b,t. . ϕ ϕ (b,t) (b,t) ∃ Think of as Q(A,s) Q(A,s), , no B load [vCase vCase(s)] (s)] = = B max [ Think of as load no max 0 0 ¬ ( ( ∃ ∃ b,t b,t. . ϕ ϕ (b,t)) (b,t)) free vars vars and and no no overlap! overlap! free ¬ ∃ b,t b,t. . ¬ ϕ (b,t) (b,t) ¬ ϕ ∧ ∃ ∧ 7

  8. First- -order Approx. Linear order Approx. Linear Prog Prog. (FOALP) . (FOALP) First � Represent value fn as linear comb. of k basis fns: Represent value fn as linear comb. of k basis fns: � 1 1 1 1 b,c BIn BIn(b,c,s) (b,c,s) t,c TIn TIn(t,c,s) (t,c,s) ∃ b,c ∃ t,c ∃ ∃ vCase(s) = w (s) = w 1 1 • vCase ⊕ … … ⊕ ⊕ w w k • k • ⊕ 0 0 0 0 b,c BIn BIn(b,c,s) (b,c,s) t,c TIn TIn(t,c,s) (t,c,s) ∃ b,c ∃ t,c ¬ ∃ ¬ ∃ ¬ ¬ � Reduces MDP solution to finding good weights… Reduces MDP solution to finding good weights… � generalize approx. LP approx. LP used by (van Roy, GKP, SP): used by (van Roy, GKP, SP): generalize Vars: : w i ; i [ [ k k Vars w i ; i Σ s Σ i=1..k Minimize: Minimize: Σ s Σ i=1..k w w i i •bCase bCase i i (s) (s) Subject to: 0 0 m m B B a [ / w i i •bCase bCase i (s)] Subject to: max [ i=1..k w i (s)] a / i=1..k max w i i •bCase bCase i (s); ∀ a ∈ A,s i=1..k w i (s); ∀ a ∈ A,s 0 / 0 / i=1..k � FOALP issues resolved in (SB, 2005): FOALP issues resolved in (SB, 2005): � We give principled approximation � ∞ ∞ sum in objective: sum in objective: We give principled approximation � Only finite set of distinct distinct constraints, constraints, � ∞ ∞ constraints: constraints: Only finite set of � solve exactly & efficiently w/ constraint gen. (SP) solve exactly & efficiently w/ constraint gen. (SP) 8

Recommend


More recommend