online learning within cooperative planning
play

Online Learning within Cooperative Planning Alborz Geramifard - PowerPoint PPT Presentation

Online Learning within Cooperative Planning Alborz Geramifard September, 2010 agf@mit.edu Joint W ork: Finale Doshi, Josh Redding, Nicholas Roy, Jonathan How Supported by: AFOSR 1 Problem W aypoint Obstacle Base 2 Why is this a hard


  1. 2 r e n r a e L r e n n a l P Existing Gap Overly Restrictive [ Heger 1994 ] Lack of Analytical Convergence [ Geibel et al. 2005 ] No Robustness Guarantees [ Abbeel et al. 2005 ] 33

  2. 2 r e n r a e L Approach r e n n a l P ! !!" !"#$%&'()*+# 0%'&-)-1 ",1#&)(23 !##$%&'()*% "1%-(89%2)5,% +,'--%& +%&4#&3'-5% "-',67)7 .#&,/ ,'#+&-($",)# ),"#+ [ Redding, Geramifard, How, ACC 2010 ] 34

  3. 2 r e n r a e L Approach r e n n a l P ! !!" !##$%&'()*%+,%'&-%& # " ! !#-.%-./. ,%'&-%& 0'.%1+ 0/-12%+ # (!) ! " ! "23#&)(45 # " " "3%-(<=%4)>2% !"*"+ 6!00"7 9).: ! "-'2;.). # (!) 8#&21 !"#$%&'()!*#+, !"#$%&''' - Stochastic Risk Model, Learners with Implicit Policy Formulation [ Geramifard, et al. ACC 2011 ] 35

  4. 2 r e n r a e L r e n n a l P Grid W orld Example 30 % Uniform Noise for Movement ( Not known to the agent ) Rewards { +1, - 1, - .001 } 36

  5. 2 r e n r a e L r e n n a l P Grid W orld Optimal Optimal CSarsa CNAC !'( !'( Planner Planner ! ! .+*/01 .+*/01 Sarsa NAC ! !'( ! !'( ! & ! & ! &'( ! &'( ! "!!! #!!! $!!! %!!! &!!!! ! "!!! #!!! $!!! %!!! &!!!! )*+,- )*+,- [ Geramifard, et al. ACC 2011 ] 37

  6. 2 r e n r a e L UAV Mission r e n n a l P +100 [2,3] .5 2 1 3 +100 5 6 .7 8 [2,3] [3,4] 4 5 +100 +200 .5 +300 7 .6 5 % Movement Failure ( Not known to the agent ) 38

  7. 2 r e n r a e L r e n n a l P UAV Mission Results P(Crash) Optimality 100 100% 90 80% 80 60% 70 40% 60 20% 50 0% 40 Learner Learner Planner + Learner Planner + Learner Planner Planner [ Geramifard, et al. ACC 2011 ] 39

  8. Outline 1 Learner 2 Planner Learner 40

  9. 1 Contributions r e n a r e L Introduced incremental Feature Dependency Discovery ( iFDD ) Scaled existing online RL methods to large domains using iFDD 2 e r n r a e L r e n n a P l Combined online learning methods with cooperative planners 41

  10. Backup Slides 42

  11. iFDD

  12. Algorithm 1: Discover Input : φ ( s ) , δ t , ξ , F , ψ Output : F , ψ foreach ( g, h ) ∈ { ( i, j ) | φ i ( s ) φ j ( s ) = 1 } do f ← g ∧ h ∈ F then if f / ψ f ← ψ f + | δ t | if ψ f > ξ then F ← F ∪ f end end end 44

  13. Algorithm 2: Activate Features Input : φ 0 ( s ) , F Output : φ ( s ) φ ( s ) ← ¯ 0 activeInitialFeatures ← { i | φ 0 i ( s ) = 1 } Candidates ← ℘ ( activeInitialFeatures ) (*sorted by set size) while activeInitialFeatures � = ∅ do f ← Candidates .next() if f ∈ F then activeInitialFeatures ← activeInitialFeatures − f φ f ( s ) ← 1 end end return φ ( s ) 45

  14. · � � � Initial+iFDD 3000 ATC 2500 Guassian 2000 Balancing Steps Tabular Initial 1500 1000 500 0 0 2 4 6 8 10 Steps 4 x 10 46

  15. initial+iFDD & !)* Tabular ! 0-,123 Initial ! !)* ATC ! & ! &)* ! " # $ % &! +,-./ # '(&! 47

  16. SDM

  17. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  18. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  19. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  20. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  21. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  22. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  23. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  24. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  25. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  26. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  27. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  28. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  29. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  30. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  31. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  32. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  33. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  34. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  35. iCCA

  36. ! ''! !"#$%& ! !"#$%&'()*+# !0%.,1',2%20 34 '((! '.$%,.**#, )*+$$#, 3256 !$+*7525 89) -.,*/ ,'#+&-($",)#. !"""# ),"#+ Stochastic Domain, Known Deterministic Risk Model [ ACC 2010, GNC 2010 ] 51

Recommend


More recommend