distributed path planning for mobile robots using a swarm
play

Distributed Path Planning for Mobile Robots using a Swarm of - PowerPoint PPT Presentation

Distributed Path Planning for Mobile Robots using a Swarm of Interacting Reinforcement Learners Chris Vigorito Department of Computer Science University of Massachusetts - Amherst vigorito@cs.umass.edu May 17th, 2007 AAMAS 07 - Honolulu,


  1. Distributed Path Planning for Mobile Robots using a Swarm of Interacting Reinforcement Learners Chris Vigorito Department of Computer Science University of Massachusetts - Amherst vigorito@cs.umass.edu May 17th, 2007 AAMAS ’07 - Honolulu, HI Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 1 / 19

  2. Local Robot Navigation - Obstacle Avoidance Local navigation (obstacle avoidance) Goal observable or only heading given Head in desired direction while avoiding obstacles Reasonably good approaches for solving this problem Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 2 / 19

  3. Global Robot Navigation - Path Planning Global navigation (path planning) Goal unobservable/heading unknown (need a model) Want least cost path to goal Lots of uncertainty/decision points Some egocentric approaches with restrictive assumptions and high complexity ? ? ? Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 3 / 19

  4. Physical Path Planning Places computational burden on distributed sensor network rather than on robot Network of unsophisticated sensor nodes with local communication capabilities Nodes communicate path information locally to produce globally optimal solution Low complexity computation at each node Robots query nodes for least cost path to desired goal Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 4 / 19

  5. Previous Work Perform distance vector routing over topological map formed by sensor network Cost-metrics used limited to hop count [Batalin, et al. (2004); Li et al. (2003); O’Hara, et al. (2006)] Nodes must be able to sense relevant information No information from robot experience is used (no learning) Only tested on uniform terrain with highly structured (e.g., grid-like) network deployments Contribution: Incorporate reinforcement learning to improve solution quality and versatility Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 5 / 19

  6. Distance Vector Routing Route incoming packet/robot to next hop router so as to minimize cost function given a destination Each node stores a distance vector estimate (estimated cost from self to all destinations) D ( x , z ) = min y ∈ N ( x ) d ( x , y ) + D ( y , z ) Distributed form of Bellman-Ford algorithm (dynamic programming) Widely used in networking applications Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 6 / 19

  7. Reinforcement Learning Well developed framework for learning from experience how to interact with an environment Goal is to maximize reward (here to minimize a cost function) Common formalism: Markov Decision Process (MDP) = < S , A , T , R > Agents learn policy π to map states to actions that minimize cost function Can be solved by learning an action-value function Q : S × A → ℜ Network routing problem formulated as MDP in Boyan and Littman (1993) Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 7 / 19

  8. Model and Assumptions B A D C All nodes/robots have some means of local communication All robots equipped with local navigation abilities All robots can obtain distance and heading to a nearby node Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 8 / 19

  9. Swarm of Interacting Reinforcement Learners (SWIRL) QA(D,B) = 10 QA(D,C) = 15 B A D C States represented as node/destination pairs Actions are next hop choices Transition function defined by network topology “Reward" = time, energy, danger, etc. Value function distributed across network Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 9 / 19

  10. Algorithm B A D C Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

  11. Algorithm B A D C Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

  12. Algorithm B A D QA(D,B) = 0 C QA(D,C) = 0 Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

  13. Algorithm B A D QA(D,B) = 0 C QA(D,C) = 0 Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

  14. Algorithm B A D QA(D,B) = 0 C QA(D,C) = 0 Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

  15. Algorithm B A D QA(D,B) = 0 C QA(D,C) = 0 Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

  16. Algorithm B A D QA(D,B) = 0 C QA(D,C) = 0 Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

  17. Algorithm B A D QA(D,B) = 0 C QA(D,C) = 10 Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

  18. Algorithm B A D QA(D,B) = 0 C QA(D,C) = 10 Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 10 / 19

  19. Simulation Environment Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 11 / 19

  20. Grid Network Deployment Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 12 / 19

  21. Random Network Deployment Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 13 / 19

  22. Grid Deployment - Single Start/Goal Pair 11 10.5 Hop Count SWIRL 10 Optimal 9.5 Time to Goal (s) 9 8.5 8 7.5 7 6.5 6 0 20 40 60 80 100 Trajectories Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 14 / 19

  23. Grid Deployment - Random Start/Goal Pairs 2 1.9 1.8 Average Time per Hop (s) 1.7 1.6 1.5 1.4 1.3 Hop Count 1.2 SWIRL 1.1 0 50 100 150 Trajectories Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 15 / 19

  24. Random Deployment - Single Start/Goal Pair 11.5 11 10.5 Time to Goal (s) 10 9.5 9 Hop Count SWIRL 8.5 0 50 100 150 Trajectories Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 16 / 19

  25. Grid Deployment - Single Start/Goal Pair 12 1 Robot 2 Robots 5 Robots 11 10 Robots 15 Robots 10 Time to Goal (s) 9 8 7 6 25 50 75 100 125 150 175 200 225 Seconds Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 17 / 19

  26. Summary Extension of existing methods for physical path planning Incorporated reinforcement learning to improve solution quality in the face of unobservability/uncertainty Performs well in wider class of environments Allows for less structured types of network deployments Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 18 / 19

  27. Limitations and Future Work Approach doesn’t currently address situation in which links are not traversable Add ability for robots to sense impasses and send infinite edge weights to nodes Mobility of sensor nodes - reconfiguration for better coverage Have robots use “shortcuts" by interpolating between nodes Chris Vigorito (UMass Amherst) Physical Path Planning with SWIRLs AAMAS ’07 19 / 19

Recommend


More recommend