2 r e n r a e L r e n n a l P Existing Gap Overly Restrictive [ Heger 1994 ] Lack of Analytical Convergence [ Geibel et al. 2005 ] No Robustness Guarantees [ Abbeel et al. 2005 ] 33
2 r e n r a e L Approach r e n n a l P ! !!" !"#$%&'()*+# 0%'&-)-1 ",1#&)(23 !##$%&'()*% "1%-(89%2)5,% +,'--%& +%&4#&3'-5% "-',67)7 .#&,/ ,'#+&-($",)# ),"#+ [ Redding, Geramifard, How, ACC 2010 ] 34
2 r e n r a e L Approach r e n n a l P ! !!" !##$%&'()*%+,%'&-%& # " ! !#-.%-./. ,%'&-%& 0'.%1+ 0/-12%+ # (!) ! " ! "23#&)(45 # " " "3%-(<=%4)>2% !"*"+ 6!00"7 9).: ! "-'2;.). # (!) 8#&21 !"#$%&'()!*#+, !"#$%&''' - Stochastic Risk Model, Learners with Implicit Policy Formulation [ Geramifard, et al. ACC 2011 ] 35
2 r e n r a e L r e n n a l P Grid W orld Example 30 % Uniform Noise for Movement ( Not known to the agent ) Rewards { +1, - 1, - .001 } 36
2 r e n r a e L r e n n a l P Grid W orld Optimal Optimal CSarsa CNAC !'( !'( Planner Planner ! ! .+*/01 .+*/01 Sarsa NAC ! !'( ! !'( ! & ! & ! &'( ! &'( ! "!!! #!!! $!!! %!!! &!!!! ! "!!! #!!! $!!! %!!! &!!!! )*+,- )*+,- [ Geramifard, et al. ACC 2011 ] 37
2 r e n r a e L UAV Mission r e n n a l P +100 [2,3] .5 2 1 3 +100 5 6 .7 8 [2,3] [3,4] 4 5 +100 +200 .5 +300 7 .6 5 % Movement Failure ( Not known to the agent ) 38
2 r e n r a e L r e n n a l P UAV Mission Results P(Crash) Optimality 100 100% 90 80% 80 60% 70 40% 60 20% 50 0% 40 Learner Learner Planner + Learner Planner + Learner Planner Planner [ Geramifard, et al. ACC 2011 ] 39
Outline 1 Learner 2 Planner Learner 40
1 Contributions r e n a r e L Introduced incremental Feature Dependency Discovery ( iFDD ) Scaled existing online RL methods to large domains using iFDD 2 e r n r a e L r e n n a P l Combined online learning methods with cooperative planners 41
Backup Slides 42
iFDD
Algorithm 1: Discover Input : φ ( s ) , δ t , ξ , F , ψ Output : F , ψ foreach ( g, h ) ∈ { ( i, j ) | φ i ( s ) φ j ( s ) = 1 } do f ← g ∧ h ∈ F then if f / ψ f ← ψ f + | δ t | if ψ f > ξ then F ← F ∪ f end end end 44
Algorithm 2: Activate Features Input : φ 0 ( s ) , F Output : φ ( s ) φ ( s ) ← ¯ 0 activeInitialFeatures ← { i | φ 0 i ( s ) = 1 } Candidates ← ℘ ( activeInitialFeatures ) (*sorted by set size) while activeInitialFeatures � = ∅ do f ← Candidates .next() if f ∈ F then activeInitialFeatures ← activeInitialFeatures − f φ f ( s ) ← 1 end end return φ ( s ) 45
· � � � Initial+iFDD 3000 ATC 2500 Guassian 2000 Balancing Steps Tabular Initial 1500 1000 500 0 0 2 4 6 8 10 Steps 4 x 10 46
initial+iFDD & !)* Tabular ! 0-,123 Initial ! !)* ATC ! & ! &)* ! " # $ % &! +,-./ # '(&! 47
SDM
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49
iCCA
! ''! !"#$%& ! !"#$%&'()*+# !0%.,1',2%20 34 '((! '.$%,.**#, )*+$$#, 3256 !$+*7525 89) -.,*/ ,'#+&-($",)#. !"""# ),"#+ Stochastic Domain, Known Deterministic Risk Model [ ACC 2010, GNC 2010 ] 51
Recommend
More recommend