object recognition with and without objects
play

Object Recognition with and without Objects Zhuotun Zhu , Lingxi Xie, - PowerPoint PPT Presentation

Object Recognition with and without Objects Zhuotun Zhu , Lingxi Xie, Alan Yuille Johns Hopkins University Object Recognition A fundamental vision problem This task traditionally means each image has exactly one label that can take a


  1. Object Recognition with and without Objects Zhuotun Zhu , Lingxi Xie, Alan Yuille Johns Hopkins University

  2. Object Recognition • A fundamental vision problem ✦ This task traditionally means each image has exactly one label that can take a single value among a finite number of choices. The assumption is that each image contains exactly one recognisable object (or perhaps none, in which case it takes the "background" label).

  3. Object Recognition • Before deep learning SIFT BoW SVM HOG LLC Cat? KNN SURF VLAD etc… etc… etc…

  4. Object Recognition • Deep learning ✦ Computational resources, e.g. , GPU ✦ Large Dataset, e.g. , ImageNet

  5. Object Recognition • Deep learning ✦ Computational resources: GPU ✦ Large Dataset: ImageNet

  6. Object Recognition • Multiple layers of learned feature detectors :) • Local feature detectors are replicated across space :) • Detectors get bigger in higher layers in space :) • Foreground and background are learnt together implicitly :( First three claims are borrowed from G.E. Hinton’s recent talk, “What is wrong with convolutional neural nets”.

  7. Intuitions • Two examples

  8. Intuitions • Two examples Bird? Snake? Squirrel? Snail? Monkey? Lizard? Bat? Scorpion? … …

  9. Intuitions • Two examples

  10. Key Questions • How well can deep neural networks learn on the pure foreground (object) and background (context)? • Could there be any difference between human and networks for understanding image (especially the foreground and background)? • What can the networks do by learning the foreground and background models separately?

  11. Datasets • ILSVRC2012[2]: 1K classes, 1.28M training, 50K testing Images w/ bounding box BGSet Annotated bounding box(es) OrigSet Images w/o bounding box HybridSet FGSet [2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision , pages 1–42, 2015.

  12. Datasets • Summary of the datasets

  13. Experiments • AlexNet[3] v.s. Human [3] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS , 2012.

  14. Experiments • Cross Validation

  15. Experiments • Ratio of bounding box The top 1 accuracy The top 5 accuracy 0.7 The accuracy averaged by class The accuracy averaged by class 0.8 0.6 0.7 0.5 0.6 0.4 0.5 0.3 0.4 0.2 OrigNet OrigNet 0.3 0.1 FGNet FGNet BGNet BGNet 0.2 0 HybridNet HybridNet 0.1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 The ratio of bounding box w.r.t the whole image The ratio of bounding box w.r.t the whole image

  16. Experiments • Patches Visualization[4] [4] J. Wang, Z. Zhang, V. Premachandran, and A. Yuille. Discovering Internal Representations from Object-CNNs Using Population Encoding. arXiv preprint, arXiv: 1511.06855 , 2015.

  17. Experiments • Recognition w. & w/o. objects

  18. Conclusions • AlexNet can learn reasonable models to explore the correlation between the foreground object and background context • AlexNet tend to perform better than human on background without objects but is beaten on foreground with object • Combining the learnt networks can be beneficial for object recognition

  19. Future Works • An end-to-end training framework for explicitly separating and then combining the foreground and background information

Recommend


More recommend