3d attention driven depth acquisition for object
play

3D Attention-Driven Depth Acquisition for Object Identification Kai - PowerPoint PPT Presentation

3D Attention-Driven Depth Acquisition for Object Identification Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or and Baoquan Chen National University of Defense Technology Shandong University Shenzhen


  1. 3D Attention-Driven Depth Acquisition for Object Identification Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or and Baoquan Chen National University of Defense Technology Shandong University Shenzhen University SIAT Stanford University Tel-Aviv University

  2. Background & motiv ivatio ion • Robotic indoor scene modeling Perception on object

  3. Background & motiv ivatio ion • Indoor environments acquisition and modeling Dense Reconstruction Object Extraction [Nießner et al. 2013] [Xu et al. 2015]

  4. Background & motiv ivatio ion What are these objects?

  5. Activ ive obje ject recognit itio ion

  6. Activ ive obje ject recognit itio ion

  7. Proble lem settin ing • A robot actively acquires new observations to gradually increase the confidence of object recognition • Two key components: Object classification View planning Estimate object class Predict the Next-Best- based on so far acquired View to maximize its observations information gain

  8. The main in chall llenge • Observatio ion is is partia ial l and progressiv ive • Shape description/matching with partial data is hard • Observations from varying views

  9. The main in chall llenge • Observatio ion is is partia ial l and progressiv ive • View planning ? Observed Unobserved ? view views ? How can you know which view is better without knowing its observation?

  10. The main in chall llenge • Real l in indoor scenes are often clu luttered • Degrade recognition accuracy • Invalidate the off-line learned viewing policy

  11. Related work

  12. Rela lated work • Onli line scene analy lysis is and modeli ling SemanticPaint Plane/Object Extraction [Valentin et al. 2015] [Zhang et al. 2014]

  13. Rela lated work • Activ ive reconstructio ion and recognit itio ion Next-best-view for reconstruction Next-best-view for recognition [Wu et al. 2014] [Wu et al. 2015]

  14. Method

  15. The general l framework

  16. The general l framework Goal Action View planning Recognition Belief Observe

  17. An attentio ional formula latio ion “Humans fo focus att ttention sele lectively on part rts of the visual space to acquire information when and where it is needed, and combine information from different fixations over time to build up an in internal l re representatio ion of the scene” Internal representation –– Ronald Rensink Hand-writing recognition Image caption generation [Mnih et al. 2014] [Xu et al. 2015]

  18. Recurrent Attentio ion Model • Recurrent Neural Networks (RNN) 𝐳 𝑢−1 𝐳 𝑢+1 𝐗 ℎℎ 𝐳 𝑢 𝐗 𝑗ℎ 𝐗 ℎ𝑝 … … 𝐳 𝑢 𝐲 𝑢 𝐢 𝑢+1 𝐢 𝑢−1 𝐢 𝑢 𝐢 𝑢 𝐲 𝑢−1 𝐲 𝑢+1 𝐲 𝑢 Aggregate information

  19. Vie iew-based observatio ion 𝑤 0 𝐽 (0) 𝜒 𝑢 𝑤 𝑢 𝜄 𝑢 𝐽 (t)

  20. 3D 3D Recurrent Attentio ion Model 𝜄 2 , 𝜚 (2) 𝜄 3 , 𝜚 (3) 𝜄 1 , 𝜚 (1) View NBV emission NBV emission selection … (2) (3) (1) ℎ 2 ℎ 2 ℎ 2 classify View classify classify … aggregation (2) (3) (1) ℎ 1 ℎ 1 ℎ 1 initial view 𝜄 1 , 𝜚 (1) 𝜄 2 , 𝜚 (2) 𝜄 0 , 𝜚 (0) Feature Feature Feature extraction extraction extraction 𝐽 (1) 𝐽 (2) 𝐽 (0)

  21. 3D 3D Recurrent Attentio ion Model 𝜄 2 , 𝜚 (2) 𝜄 3 , 𝜚 (3) 𝜄 1 , 𝜚 (1) CNN 1 Max-pooling NBV emission NBV emission View pooling ℓ 1 ℓ 2 CNN 1 … (2) (3) CNN 2 (1) ℎ 2 ℎ 2 ℎ 2 …… … … ℓ 𝐿 classify classify classify CNN 1 … Multi-View CNN [Su et al. 2015] (2) (3) (1) ℎ 1 ℎ 1 ℎ 1 initial view 𝜄 1 , 𝜚 (1) 𝜄 2 , 𝜚 (2) 𝜄 0 , 𝜚 (0) Feature Feature Feature extraction extraction extraction 𝐽 (1) 𝐽 (2) 𝐽 (0)

  22. Network train inin ing Reinforcement CNN learning Back propagation 𝜄 𝑗 , 𝜚 (𝑗) 𝜄 𝑗 , 𝜚 (𝑗) 𝐽 (𝑗) 𝐽 (𝑗) rendering Indifferentiable

  23. Rein inforcement le learnin ing agent Stop? Depth state acquisition action reward How good the depth is? environment

  24. Reward 𝑠 𝑢 = 𝐼 𝑢 𝑞 𝑢 , 𝑞 + 𝐽 𝑢 𝑞 𝑢 , 𝑞 𝑢−1 − 𝐷 𝑢 prediction information movement accuracy gain cost

  25. Part-le level attentio ion occlusion Informative parts How to distinguish these two chairs?

  26. Attentio ion extractio ion Convolutional Neural Network … … … Mid-level …… kernels in CNN …

  27. Attentio ion extractio ion One wing Two wings

  28. Results and evaluation

  29. Database 57,452 models 12,311 models 57 categories 40 categories 52 sampled views Render model 260 sampled views Render with jittering

  30. Tim imin ing Database MV-RNN train MV-RNN test ShapeNet 49 hr. 0.1 sec. ModelNet40 22 hr. 0.1 sec.

  31. Vis isuali lizatio ion of attentio ions Part-level attention View sequence View sequence

  32. NBV estim imation 40 classes Classification Accuracy

  33. NBV estim imation under occlu lusio ion Classification Accuracy …

  34. Result lts on real l scenes

  35. Result lts on real l scenes

  36. Result lts on real l scenes

  37. Lim imit itations • Recognizable objects • No contextual information

  38. Future works: Mult lti-modal l recognit itio ion What is this? Image database Shape database

  39. Future: Mult lti-robot scene reconstructio ion & understandin ing AscTec Pelican PR2 Turtlebot 40

  40. Future: Mult lti-robot attentio ion model Attention based on shared internal representation? 41

  41. Thank you Q & A More details: kevinkaixu.net & yifeishi.net

Recommend


More recommend