cognitive mapping and planning for visual navigation
play

Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 - PowerPoint PPT Presentation

Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 , 2 James Davidson 2 Sergey Levine 1 , 2 Rahul Sukthankar 2 Jitendra Malik 1 , 2 1 UC Berkeley 2 Google Presented by Kent Sommer Korea Advanced Institute of Science and


  1. Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 , 2 James Davidson 2 Sergey Levine 1 , 2 Rahul Sukthankar 2 Jitendra Malik 1 , 2 1 UC Berkeley 2 Google Presented by Kent Sommer Korea Advanced Institute of Science and Technology

  2. Table of contents 1. Problem Statement 2. Related Work 3. Contribution 4. Results 5. Video Demo 6. Summary 1

  3. Problem Statement

  4. Problem Statement Robot equipped with a first Dropped into a novel environment Navigate in the person camera environment Robot Navigation in novel envionments 2

  5. Motivation: Intelligent Navigation What does it mean to navigate intelligently? • Navigate through novel environments • Draw on prior experience or similar conditions • Reason about free-space, obstacle-space, topology 3

  6. Motivation: Why Are Humans So Good? Humans can often reason about their environment while classical agents can at best do uninformed exploration • Know where we are likely to find a chair • Know that hallways often lead to other hallways • Know common building patterns 4

  7. Related Work

  8. Classical Work • Over-complete • Precise reconstruction of everything is not necessary • Incomplete LSD-SLAM • Nothing is known till it is explicitly observed, fail to exploit the structure of the world • Only geometry, no semantics • Unnecessarily fragile due to separation between mapping and planning RRT 5

  9. Contemporary Work Human-level control through deep reinforcement learning, Mnih et al., Nature 2014 Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning, Q Q Q Zhu et al., ICRA 2017 Q Q Memory Memory Memory Context Context Context Context Context CNN CNN CNN CNN CNN x t − M x t x t x t x t x t DQN DRQN MQN MRQN FRMQN Control of Memory, Active Perception, and Action in Minecraft, Oh et al., IMCL 2016 End-to-End Training of Deep Visuomotor Polocies, Levine et al., JMLR 2015 6

  10. Contemporary Work Feed Forward architecture without memory. • Agent can’t systematically scene #1 policy explore a new environment or (4) observation value (1) backtrack. embedding ResNet-50 fusion policy scene #2 • Agent needs experience with a fc (4) (512) 224x224x3 value (1) new environment before it can W ... start navigating successfully. target scene #N policy (4) ResNet-50 value fc (1) (512) fc (512) fc (512) 224x224x3 generic siamese layers scene-specific layers 7

  11. Contribution

  12. Contribution Neural network policy for visual navigation • Joint architecture for mapping and planning • Spatial memory with the ability to plan given partial observations • Is end-to-end trainable 8

  13. Multiscale belief about the world in egocentric coordinate frame Cognitive Mapping and Planning: System Overview Goal Multiscale belief of the 90 o world in egocentric coordinate frame Egomotion Differentiable Mapper Differentiable Hierarchical 90 o Planner Action Update multiscale belief 90 o of the world in egocentric coordinate frame Egomotion Differentiable Differentiable Mapper Hierarchical Planner 90 o Action 9

  14. Differentiable Mapper Confidence and belief about world Confidence and belief about world from from previous time step. previous time step, warped using egomotion. Past Frames and Egomotion Differentiable Warping 90 o Updated confidence and belief about world. Egomotion Encoder Network Decoder Network with (ResNet 50) residual connections Fully Connected Combine Layers with ReLUs. 10

  15. Differentiable Planner Value Iteration Network 1 • Q n ( s, a ) = R ( s, a ) + γ � s ′ P ( s ′ | s, a ) V n ( s ′ ) • Computed as convolutions • V n +1 ( s ) = max a Q n ( s, a ) ∀ s • Computed as max pooling over channels 1 Aviv Tamar et al. “Value iteration networks”. In: Advances in Neural Information Processing Systems . 2016, pp. 2146–2154. 11

  16. Differentiable Planner: Value Iteration Network • Q n ( s, a ) = R ( s, a ) + γ � s ′ P ( s ′ | s, a ) V n ( s ′ ) • Computed as convolutions • V n +1 ( s ) = max a Q n ( s, a ) ∀ s • Computed as max pooling over channels Trainable using simulated data 12

  17. Experimental Setup: Overview • Trained and tested in static simulated real-world environments • Testing environment is different from training environments • Robot: • Lives in a grid world, and motion is discrete • Has 4 macro-actions: • Go Forward, Turn left, Turn right, Stay in place • Has access to precise egomotion • Has RGB and/or Depth Cameras • All models are trained using DAGGER • Geometric Task: • Goal is sampled to be at most 32 time steps away. Agent is run for 39 time steps. • Semantic Task: • ’Go to a Chair,’ agent run for 39 time steps. 13

  18. Experimental Setup: Dataset Stanford Building Parser Dataset 14

  19. Experimental Setup: Policy Training Use DAGGER 2 3 2 St´ ephane Ross, Geoffrey J Gordon, and Drew Bagnell. “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning.” In: AISTATS . vol. 1. 2. 2011, p. 6. 3 Image from: John Schulman´ s Lecture on Reinforcement Learning 15

  20. Results

  21. Mapper Unit Test Ground Truth Analytical Project RGB Pred D Pred 16

  22. Navigation Results: Geometric Task 75 th %ile Mean Success %age Method RGB Depth RGB Depth RGB Depth Geometric Task Initial 25.3 25.3 30 30 0.7 0.7 No Image LSTM 20.8 20.8 28 28 6.2 6.2 Reactive (1 frame) 20.9 17.0 28 26 8.2 21.9 Reactive (4 frames) 14.4 8.8 25 18 31.4 56.9 LSTM 10.3 5.9 21 5 53.0 71.8 Our (CMP) 7.7 4.8 14 1 62.5 78.3 Geometric Results: Mean distance to goal location, 75 th percentile distance to goal and success rate after executing the policy for 39 time steps. 17

  23. Navigation Results: Semantic Task 75 th %ile Mean Success %age Method RGB Depth RGB Depth RGB Depth Semantic Task (Aggregate) Initial 16.2 16.2 25 25 11.3 11.3 Reactive 14.2 14.2 22 23 23.4 22.3 LSTM 13.5 13.4 20 23 23.5 27.2 Our (CMP) 11.3 11.0 18 19 34.2 40.0 Semantic Results: Mean distance to goal location, 75 th percentile distance to goal and success rate after executing the policy for 39 time steps. 18

  24. Successful Navigations Agents exhibit backtracking behavior! 19

  25. Failure Cases Missed Thrashing Tight 20

  26. Video Demo

  27. Demo Video Demonstration 21

  28. Summary

  29. Summary • Joint fully end-to-end neural network policy for mapping and planning • Uses mapping module to map from RGB and/or Depth images to a top-down ego-centric belief map • Uses a Value Iteration Network to plan in the belief map generated by the mapper • Trains the end-to-end policy using DAGGER 22

  30. Questions? 22

  31. Quiz • Why was DAGGER used to train the models? 1. Other training methods were not possible 2. To allow the agent to recover from bad decisions (backtracking) 3. To minimize crashes in simulation 4. Because it has a cool name • The model was trained end-to-end allowing for the mapping module to encode whatever was most useful to the planning module 1. True 2. False

Recommend


More recommend