scene navigation by knowledge graph and interaction
play

Scene Navigation by Knowledge Graph and Interaction Mohammad - PowerPoint PPT Presentation

Scene Navigation by Knowledge Graph and Interaction Mohammad Rastegari ICCV, Oct, 2019 Task Navigate to Television Television Television Television Television Move Move Rotate Done Forward Forward Right 120 Scenes Room


  1. Scene Navigation by Knowledge Graph and Interaction Mohammad Rastegari ICCV, Oct, 2019

  2. Task Navigate to Television … Television Television Television Television Move Move Rotate Done Forward Forward Right

  3. • 120 Scenes • Room types • Kitchen • Living room • Bed room • Bath room • Each room class has 30 scenes • Training : 20 rooms/class • Testing: 5 rooms/class

  4. Challenges • Normally we relocate a seen object in a seen scene • The main challenges are: • Generalizing to unseen scene • Generalizing to unseen object

  5. Using Prior Knowledge Apple Coffee machine Cup Mango

  6. Knowledge Graph

  7. Scene Prior Plate Table Sand- Sink wich next to/on on Painting Remote Coffe Cabinet Machine TV Mug Bowl Table next to next to Cabinet Counter Micro- Laptop wave Box Toaster

  8. Scene Prior Graph Remote n e x t t o Television

  9. Architecture Flow History frames ! " Actor-Critic Model Environment ResNet-50 FC (512) # " Value Word MLP “ Television ” Embedding Action Sampler Policy Remote FC (512) Graph n Convolutional e x t Network t o FC (512) Joint Television Embedding

  10. Architecture Flow with Scene Prior Graph History frames ! " Actor-Critic Model Environment ResNet-50 FC (512) # " Value Word MLP “ Television ” Embedding Action Sampler Policy Remote FC (512) Graph n Convolutional e x t Network t o FC (512) Joint Television Embedding

  11. Architecture Flow with Scene Prior Graph History frames ! " Actor-Critic Model Environment ResNet-50 FC (512) # " Value Word MLP “ Television ” Embedding Action Sampler Policy Remote FC (512) Graph n Convolutional e x t Network t o FC (512) Joint Television Embedding

  12. Graph Convolutional Network (GCN) H ( l +1) = f ( b AH ( l ) W ( l ) ) f ( b : Normalized Adjacency Matrix AH : Node features at the l th layer b AH ( l ) l ) W ( l ) ) : Learnable parameters at the l th Layer : Activation Function (e.g. ReLU) f

  13. GCN for Scene Navigation * + “Fridge” $% ) ' ) ) $% & ' & ) !( # !( # FC (512) 1000 class score ResNet-50 … 512 512 “Toaster” concat 3 Layers The knowledge graph is updated over time according to the recent observations

  14. Action Space • Move Ahead • Move Back • Rotate Right • Rotate Left • Stop We consider the stop action and expect the agent to issue this action when it reaches the target. This makes the learning challenging.

  15. Seen Scenes, No Novel Objects

  16. Bedroom | Mi Mirr rror or

  17. Livingroom | Pa Painting

  18. Kitchen | To Toaster

  19. Kitchen | Mi Microwave

  20. een Scenes, Known Objects Un Unseen

  21. Bathroom | Soa Soap

  22. Bedroom | La Lamp mp

  23. Bedroom | Li Light S Switch ch

  24. Kitchen | Ca Cabinet

  25. een Scenes, No Novel Objects Un Unseen

  26. Bathroom | To Towel

  27. Kitchen | Mi Microwave

  28. Evaluation Metrics • S uccess R ate (SR) • The ratio of successful navigations toward the object over N episodes • S uccess weighted by P ath L ength (SPL) • The ratio of successful navigations toward the object weighted by the path length over N episodes considering both Success Rate and P N as 1 L i i =1 S i max ( P i ,L i ) , N episode i , P represents

  29. (SPL / SR) without STOP action (250 episods) Kitchen Living room Bedroom Bathroom Avg. Random 17.9 / 33.1 12.1 / 30.5 16.8 / 51.2 24.5 / 34.6 17.8 / 37.3 Seen scenes, A3C 79.9 / 86.7 38.8 / 57.6 87.8 / 89.5 93.7 / 96.6 75.0 / 82.5 Known objects Ours 83.5 / 88.2 46.4 / 64.4 90.6 / 92.7 93.6 / 96.5 78.5 / 85.5 Random 10.0 / 23.1 8.0 / 18.5 17.3 / 35.2 11.2 / 32.2 11.6 / 27.2 Seen scenes, A3C 20.2 / 38.8 24.2 / 46.5 23.5 / 35.8 50.2 / 74.6 29.5 / 48.9 Novel objects Ours 22.9 / 53.6 39.5 / 66.5 26.1 / 38.9 50.5 / 78.6 34.7 / 59.4 Random 27.3 / 45.2 5.6 / 16.6 13.1 / 34.5 36.0 / 49.1 20.5 / 36.3 Unseen scenes, A3C 39.5 / 56.2 12.0 / 31.8 22.5 / 49.2 47.4 / 60.2 30.3 / 49.3 Known objects Ours 46.2 / 62.5 13.8 / 40.6 26.5 / 58.6 51.5 / 65.8 34.5 / 56.9 Random 21.3 / 44.3 3.3 / 22.9 25.8 / 47.8 25.5 / 48.9 19.0 / 41.0 Unseen scenes, A3C 26.1 / 56.3 9.4 / 25.1 28.2 / 54.0 33.8 / 90.7 24.4 / 56.5 Novel objects Ours 38.5 / 62.5 13.7 / 40.3 30.1 / 63.1 39.2 / 93.6 30.4 / 64.9 Table 2: Results without termination (stop) action. SPL / Success rate ( ) is shown. We compare

  30. (SPL / SR) with STOP action Kitchen Living room Bedroom Bathroom Avg. Random 2.4 / 3.5 1.1 / 1.7 1.8 / 2.7 3.2 / 4.8 2.1 / 3.1 Seen scenes, A3C 38.5 / 51.0 9.7 / 15.1 6.8 / 11.5 69.1 / 81.0 31.1 / 39.6 Known objects Ours 58.6 / 72.7 12.4 / 18.6 41.6 / 52.4 71.3 / 83.0 46.0 / 56.7 Random 0.9 / 1.3 0.8 / 1.2 2.3 / 3.4 1.4 / 2.1 1.4 / 2.0 Seen scenes, A3C 2.1 / 4.9 3.2 / 4.8 0.5 / 1.7 17.1 / 28.5 5.7 / 9.9 Novel objects Ours 3.2 / 6.1 9.8 / 16.2 6.2 / 8.6 24.7 / 37.3 11.0 / 17.1 Unseen scenes, Random 4.1 / 5.9 0.9 / 1.3 1.6 / 2.4 4.2 / 6.2 2.7 / 3.9 A3C 11.5 / 18.8 0.5 / 2.5 2.2 / 3.8 8.6 / 18.7 5.7 / 10.4 Known objects Ours 12.7 / 20.5 1.0 / 4.0 4.5 / 11.0 8.7 / 21.1 6.7 / 13.4 Random 2.0 / 2.8 0.6 / 1.0 2.0 / 2.8 2.7 / 3.9 1.8 / 2.6 Unseen scenes, A3C 2.2 / 7.5 2.5 / 4.4 1.3 / 4.4 3.4 / 9.3 2.4 / 5.9 Novel objects 3.3 / 12.7 2.8 / 5.3 2.0 / 6.3 4.1 / 12.2 3.1 / 8.5 Ours able 1: Results using termination (stop) action. SPL / Success rate ( ) is shown. We compare

  31. Traditional Training Learning to Adapt Adaptation During Traditional Inference Inference

  32. Initial Model Parameters Compute Adapted Parameters Initialize Model Complete Navigation Episode Take k steps Compute Compute Self- Supervised Supervised Navigation Loss Interaction Loss Backprop to Update Initialization

  33. Learning to Learn Inference how to Learn Navigation Gradient (supervised) Learned Interaction Gradient (self-supervised)

  34. Initial Model Parameters Compute Adapted Parameters Initialize Model Complete Navigation Episode Take k steps Compute Compute Self- Supervised Compute Self- Loss Supervised Navigation Loss Supervised Parameters Interaction Loss Interaction Loss via Neural Network

  35. Navigation-Gradient (Training only) Forward Pass Interaction-Gradient (Training and Inference) 1D Temporal ResNet18 (Frozen) Conv Current Turn Look Move observation Image Down Forward Left Pointwise Feature Conv … 0 1 2 $ Pointwise Conv ,/×.×. ()*×.×. LSTM LSTM LSTM Target Glove Embedding Object Class Tile Laptop FC Concatenated 1 ×"## ,/×.×. policy and $ = # hidden states $ = ) &×(()* + ,) $ = *

  36. Re Results Handcrafted Loss Handcrafted Loss Learned Loss Learned Loss Baseline Baseline SPL Success Training Scenes: 80 Validation Scenes: 20 Test Scenes: 20 Equal Split of Kitchen, Living Room, Bedroom, Bathroom

  37. Goal: Navigate to Book

  38. Thank you !!!!!

Recommend


More recommend