Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 - PowerPoint PPT Presentation

Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 , 2 James Davidson 2 Sergey Levine 1 , 2 Rahul Sukthankar 2 Jitendra Malik 1 , 2 1 UC Berkeley 2 Google Presented by Kent Sommer Korea Advanced Institute of Science and Technology

Table of contents 1. Problem Statement 2. Related Work 3. Contribution 4. Results 5. Video Demo 6. Summary 1

Problem Statement

Problem Statement Robot equipped with a first Dropped into a novel environment Navigate in the person camera environment Robot Navigation in novel envionments 2

Motivation: Intelligent Navigation What does it mean to navigate intelligently? • Navigate through novel environments • Draw on prior experience or similar conditions • Reason about free-space, obstacle-space, topology 3

Motivation: Why Are Humans So Good? Humans can often reason about their environment while classical agents can at best do uninformed exploration • Know where we are likely to find a chair • Know that hallways often lead to other hallways • Know common building patterns 4

Related Work

Classical Work • Over-complete • Precise reconstruction of everything is not necessary • Incomplete LSD-SLAM • Nothing is known till it is explicitly observed, fail to exploit the structure of the world • Only geometry, no semantics • Unnecessarily fragile due to separation between mapping and planning RRT 5

Contemporary Work Human-level control through deep reinforcement learning, Mnih et al., Nature 2014 Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning, Q Q Q Zhu et al., ICRA 2017 Q Q Memory Memory Memory Context Context Context Context Context CNN CNN CNN CNN CNN x t − M x t x t x t x t x t DQN DRQN MQN MRQN FRMQN Control of Memory, Active Perception, and Action in Minecraft, Oh et al., IMCL 2016 End-to-End Training of Deep Visuomotor Polocies, Levine et al., JMLR 2015 6

Contemporary Work Feed Forward architecture without memory. • Agent can’t systematically scene #1 policy explore a new environment or (4) observation value (1) backtrack. embedding ResNet-50 fusion policy scene #2 • Agent needs experience with a fc (4) (512) 224x224x3 value (1) new environment before it can W ... start navigating successfully. target scene #N policy (4) ResNet-50 value fc (1) (512) fc (512) fc (512) 224x224x3 generic siamese layers scene-specific layers 7

Contribution

Contribution Neural network policy for visual navigation • Joint architecture for mapping and planning • Spatial memory with the ability to plan given partial observations • Is end-to-end trainable 8

Multiscale belief about the world in egocentric coordinate frame Cognitive Mapping and Planning: System Overview Goal Multiscale belief of the 90 o world in egocentric coordinate frame Egomotion Differentiable Mapper Differentiable Hierarchical 90 o Planner Action Update multiscale belief 90 o of the world in egocentric coordinate frame Egomotion Differentiable Differentiable Mapper Hierarchical Planner 90 o Action 9

Differentiable Mapper Confidence and belief about world Confidence and belief about world from from previous time step. previous time step, warped using egomotion. Past Frames and Egomotion Differentiable Warping 90 o Updated confidence and belief about world. Egomotion Encoder Network Decoder Network with (ResNet 50) residual connections Fully Connected Combine Layers with ReLUs. 10

Differentiable Planner Value Iteration Network 1 • Q n ( s, a ) = R ( s, a ) + γ � s ′ P ( s ′ | s, a ) V n ( s ′ ) • Computed as convolutions • V n +1 ( s ) = max a Q n ( s, a ) ∀ s • Computed as max pooling over channels 1 Aviv Tamar et al. “Value iteration networks”. In: Advances in Neural Information Processing Systems . 2016, pp. 2146–2154. 11

Differentiable Planner: Value Iteration Network • Q n ( s, a ) = R ( s, a ) + γ � s ′ P ( s ′ | s, a ) V n ( s ′ ) • Computed as convolutions • V n +1 ( s ) = max a Q n ( s, a ) ∀ s • Computed as max pooling over channels Trainable using simulated data 12

Experimental Setup: Overview • Trained and tested in static simulated real-world environments • Testing environment is different from training environments • Robot: • Lives in a grid world, and motion is discrete • Has 4 macro-actions: • Go Forward, Turn left, Turn right, Stay in place • Has access to precise egomotion • Has RGB and/or Depth Cameras • All models are trained using DAGGER • Geometric Task: • Goal is sampled to be at most 32 time steps away. Agent is run for 39 time steps. • Semantic Task: • ’Go to a Chair,’ agent run for 39 time steps. 13

Experimental Setup: Dataset Stanford Building Parser Dataset 14

Experimental Setup: Policy Training Use DAGGER 2 3 2 St´ ephane Ross, Geoffrey J Gordon, and Drew Bagnell. “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning.” In: AISTATS . vol. 1. 2. 2011, p. 6. 3 Image from: John Schulman´ s Lecture on Reinforcement Learning 15

Results

Mapper Unit Test Ground Truth Analytical Project RGB Pred D Pred 16

Navigation Results: Geometric Task 75 th %ile Mean Success %age Method RGB Depth RGB Depth RGB Depth Geometric Task Initial 25.3 25.3 30 30 0.7 0.7 No Image LSTM 20.8 20.8 28 28 6.2 6.2 Reactive (1 frame) 20.9 17.0 28 26 8.2 21.9 Reactive (4 frames) 14.4 8.8 25 18 31.4 56.9 LSTM 10.3 5.9 21 5 53.0 71.8 Our (CMP) 7.7 4.8 14 1 62.5 78.3 Geometric Results: Mean distance to goal location, 75 th percentile distance to goal and success rate after executing the policy for 39 time steps. 17

Navigation Results: Semantic Task 75 th %ile Mean Success %age Method RGB Depth RGB Depth RGB Depth Semantic Task (Aggregate) Initial 16.2 16.2 25 25 11.3 11.3 Reactive 14.2 14.2 22 23 23.4 22.3 LSTM 13.5 13.4 20 23 23.5 27.2 Our (CMP) 11.3 11.0 18 19 34.2 40.0 Semantic Results: Mean distance to goal location, 75 th percentile distance to goal and success rate after executing the policy for 39 time steps. 18

Successful Navigations Agents exhibit backtracking behavior! 19

Failure Cases Missed Thrashing Tight 20

Video Demo

Demo Video Demonstration 21

Summary

Summary • Joint fully end-to-end neural network policy for mapping and planning • Uses mapping module to map from RGB and/or Depth images to a top-down ego-centric belief map • Uses a Value Iteration Network to plan in the belief map generated by the mapper • Trains the end-to-end policy using DAGGER 22

Questions? 22

Quiz • Why was DAGGER used to train the models? 1. Other training methods were not possible 2. To allow the agent to recover from bad decisions (backtracking) 3. To minimize crashes in simulation 4. Because it has a cool name • The model was trained end-to-end allowing for the mapping module to encode whatever was most useful to the planning module 1. True 2. False

Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 - PowerPoint PPT Presentation

Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 , 2 James Davidson 2 Sergey Levine 1 , 2 Rahul Sukthankar 2 Jitendra Malik 1 , 2 1 UC Berkeley 2 Google Presented by Kent Sommer Korea Advanced Institute of Science and

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Navigation, Gravitation and Navigation, Gravitation and Navigation, Gravitation and Navigation,

Spatial navigation in humans Recap: navigation strategies and spatial representations Spatial

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Haptic Navigation in Mobile Contexts Agenda What is Haptic Navigation? Advantages of

React Native Navigation Screens, moving, parameters React Navigation React Navigation is not

React Native Navigation: Tabs 1 Tab Navigation the most common style of navigation in

OFDM Signal Navigation NAV 2008 2 OFDM Signal Navigation NAV 2008 3 OFDM Signal Navigation

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Cognitive Interviewing Debbie Collins What is cognitive interviewing? Cognitive interviewing

HOUSING NAVIGATION CENTER https://www.hayward-ca.gov/content/hayward- housing-navigation-center

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Analysing the Cognitive Effectiveness of the UCM Visual Notation of the UCM Visual Notation

Genetics-based Machine Learning and Behaviour Based Robotics: A New Synthesis

A Balance of Intelligence The Art & Science of working in the digital world @DvirYuval

QTL Association Mapping 1 / 38 Introduction to Quantitative Trait Mapping We previously focused

Advanced Section #1: Moving averages, optimization algorithms, understanding dropout and batch

Using Cognitive Mapping 19th ISPE International Conference on Concurrent Engineering CE2012,

Crowdsourcing 3D Semantic Maps for Vehicle Cognition Cognition for Cars Decisions Eyes

CREATING A POWERFUL SYLLABUS: DOING WHAT WORKS! Christine Harrington Ph.D. Middlesex County

Using Argument Mapping to Teach Critical Thinking Across the Curriculum Todd Huspeni

Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 - PowerPoint PPT Presentation

Cognitive Mapping and Planning for Visual Navigation Saurabh Gupta 1 , 2 James Davidson 2 Sergey Levine 1 , 2 Rahul Sukthankar 2 Jitendra Malik 1 , 2 1 UC Berkeley 2 Google Presented by Kent Sommer Korea Advanced Institute of Science and

Texture and other Mappings Texture Mapping Texture Mapping Bump Mapping Bump Mapping

Navigation, Gravitation and Navigation, Gravitation and Navigation, Gravitation and Navigation,

Spatial navigation in humans Recap: navigation strategies and spatial representations Spatial

Image Warping Image Mapping Image Mapping - Examples Forward Mapping Forward Mapping -

TEXTURE MAPPING 1 OUTLINE Introduce Mapping Methods Texture Mapping Environment

Haptic Navigation in Mobile Contexts Agenda What is Haptic Navigation? Advantages of

React Native Navigation Screens, moving, parameters React Navigation React Navigation is not

React Native Navigation: Tabs 1 Tab Navigation the most common style of navigation in

OFDM Signal Navigation NAV 2008 2 OFDM Signal Navigation NAV 2008 3 OFDM Signal Navigation

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Cognitive Interviewing Debbie Collins What is cognitive interviewing? Cognitive interviewing

HOUSING NAVIGATION CENTER https://www.hayward-ca.gov/content/hayward- housing-navigation-center

Texture Mapping Texture Mapping 1 Texture Mapping Texture Mapping Motivation Motivation:

Texture Mapping Surface mapping OpenGl and Implementation Details Texture mapping Bump

Advanced Texturing Environment Mapping Environment Mapping reflections Environment Mapping

Analysing the Cognitive Effectiveness of the UCM Visual Notation of the UCM Visual Notation

Genetics-based Machine Learning and Behaviour Based Robotics: A New Synthesis

A Balance of Intelligence The Art &amp; Science of working in the digital world @DvirYuval

QTL Association Mapping 1 / 38 Introduction to Quantitative Trait Mapping We previously focused

Advanced Section #1: Moving averages, optimization algorithms, understanding dropout and batch

Using Cognitive Mapping 19th ISPE International Conference on Concurrent Engineering CE2012,

Crowdsourcing 3D Semantic Maps for Vehicle Cognition Cognition for Cars Decisions Eyes

CREATING A POWERFUL SYLLABUS: DOING WHAT WORKS! Christine Harrington Ph.D. Middlesex County

Using Argument Mapping to Teach Critical Thinking Across the Curriculum Todd Huspeni

A Balance of Intelligence The Art & Science of working in the digital world @DvirYuval