neural attention for object tracking
play

Neural Attention for Object Tracking Brian Cheung - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley Neural Attention for Object Tracking Brian Cheung bcheung@berkeley.edu Redwood Center for Theoretical Neuroscience, UC Berkeley Visual Computing Research, NVIDIA Source: Wikipedia School Bus Motivation


  1. April 4-7, 2016 | Silicon Valley Neural Attention for Object Tracking Brian Cheung bcheung@berkeley.edu Redwood Center for Theoretical Neuroscience, UC Berkeley Visual Computing Research, NVIDIA

  2. Source: Wikipedia “School Bus”

  3. Motivation Solving complex vision problems ● Question Answering ● Search ● Navigation Two core components: ● Attention ● Memory 3

  4. Emergent Properties from Attention Xu et. al. 2015 4

  5. Recurrent Networks o(t) h(t) x(t) 5

  6. Formulating a Glimpse Parameters in the kernel control the layout of the attention window over the original image. Translation Scale 6

  7. Spatial Transformer Jaderberg et. al. 2015 7

  8. Spatial Transformer Network Jaderberg et. al. 2015 8

  9. Foveal Attention Network Classification Network Location Network Recurrent Network Glimpse Network Image Cheung et. al. 2015 9

  10. Foveal Attention Network Classification Network Location Network Recurrent Network Glimpse Network Image Cheung et. al. 2015 10

  11. Foveal Attention Network Classification Network Location Network Recurrent Network Glimpse Network Image Cheung et. al. 2015 11

  12. Foveal Attention Network Classification Network Location Network Recurrent Network Glimpse Network Image Cheung et. al. 2015 12

  13. Foveal Attention Network Classification Network Location Network Recurrent Network Glimpse Network Image Cheung et. al. 2015 13

  14. Foveal Attention Network Classification Network Location Network ‘5’ Recurrent Network Glimpse Network Image Cheung et. al. 2015 14

  15. Benefits of Attention ● Less parameters/Less Computation ○ Smaller Convolutional Network ● Better Performance ○ Significant performance over ConvNet over entire image ○ Breaks down complex problems into a sequence of simpler problems ○ Filters out noise and distractors ● Localization information is free 15

  16. KITTI Tracking Dataset Geiger et. al. 2012 ● 375x1240 video ● Bounding boxes over time of cars, pedestrians, etc. 16

  17. θ Localization Network Recurrent Network Tracking Network Convolutional Network Generate Image Glimpse = T θ (Image(t), θ loc (t-1)) Grid Generator 17

  18. θ Localization Network Recurrent Network Tracking Network Generate features from ConvNet Convolutional Network h cnet (t) = f cnet ( ) Grid Generator 18

  19. θ Localization Network Recurrent Network Tracking Network Convolutional Network Generate features from Recurrent Network h rnn (t) = f rnn (h cnet (t) , θ loc (t-1), h rnn (t-1)) Grid Generator 19

  20. θ Generate parameters for next glimpse from Localization Network Localization Network θ loc (t) = f loc (h rnn (t-1)) Recurrent Network Tracking Network Convolutional Network Grid Generator 20

  21. θ Localization Network Recurrent Network Tracking Network Convolutional Network Generate tracking prediction from Tracking Network θ pred (t), y pres (t) = f tracking (h rnn (t-1)) Grid Generator 21

  22. Pretraining on Classification Task {‘Car’, ‘Pedestrian’, ‘Truck’, ‘Tram’, ‘Cyclist’, ‘Misc’, ‘Van’, ‘Person Sitting’} Convolutional Network ~3% Classification Error on validation set Grid Generator 22

  23. Pretraining on the Registration Task Glimpse Parameters θ Convolutional Network Grid Generator 23

  24. Pretraining on the Registration Task ● Simpler task similar to tracking: Fix a bad glimpse ● Useful signal for Localization Network Input Glimpse Predicted Correction Actual Correction 24

  25. Comparing Training Gradients Without pretraining (Random Initialization) With ConvNet Pretraining 25

  26. Bouncing MNIST

  27. Bouncing MNIST

  28. Bouncing MNIST MNIST Position Attention Position Output: Tracking Network Localization Network Ground Truth x x Prediction y y

  29. Conclusions ● End-to-End visual attention works for simple tasks ● Robust to encoding of attention parameters 29

  30. Conclusions ● Difficult to train on more complex tasks ○ First Step toward Model-Free, Anonymous Object Tracking with Recurrent Neural Networks (Gan et. al. 2015) ○ RATM: Recurrent Attentive Tracking Model (Kahou et. al. 2015) ● Scaling computational costs 30

  31. Future Work ● Integrate more tailored components ○ Spatial Memory (Weiss et. al. 2015) ● Train compact ImageNet models for initialization ● Exploration/Unsupervised strategies to recover from mistakes ○ Error Based Attention (Rezende et. al. 2016) 31

  32. Acknowledgements Special thanks to: Shalini Gupta Jan Kautz Pavlo Molchanov Stephen Tyree Eric Weiss 32

Recommend


More recommend