3D Attention-Driven Depth Acquisition for Object Identification Kai - PowerPoint PPT Presentation

3D Attention-Driven Depth Acquisition for Object Identification Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or and Baoquan Chen National University of Defense Technology Shandong University Shenzhen University SIAT Stanford University Tel-Aviv University

Background & motiv ivatio ion • Robotic indoor scene modeling Perception on object

Background & motiv ivatio ion • Indoor environments acquisition and modeling Dense Reconstruction Object Extraction [Nießner et al. 2013] [Xu et al. 2015]

Background & motiv ivatio ion What are these objects?

Activ ive obje ject recognit itio ion

Proble lem settin ing • A robot actively acquires new observations to gradually increase the confidence of object recognition • Two key components: Object classification View planning Estimate object class Predict the Next-Best- based on so far acquired View to maximize its observations information gain

The main in chall llenge • Observatio ion is is partia ial l and progressiv ive • Shape description/matching with partial data is hard • Observations from varying views

The main in chall llenge • Observatio ion is is partia ial l and progressiv ive • View planning ? Observed Unobserved ? view views ? How can you know which view is better without knowing its observation?

The main in chall llenge • Real l in indoor scenes are often clu luttered • Degrade recognition accuracy • Invalidate the off-line learned viewing policy

Related work

Rela lated work • Onli line scene analy lysis is and modeli ling SemanticPaint Plane/Object Extraction [Valentin et al. 2015] [Zhang et al. 2014]

Rela lated work • Activ ive reconstructio ion and recognit itio ion Next-best-view for reconstruction Next-best-view for recognition [Wu et al. 2014] [Wu et al. 2015]

Method

The general l framework

The general l framework Goal Action View planning Recognition Belief Observe

An attentio ional formula latio ion “Humans fo focus att ttention sele lectively on part rts of the visual space to acquire information when and where it is needed, and combine information from different fixations over time to build up an in internal l re representatio ion of the scene” Internal representation –– Ronald Rensink Hand-writing recognition Image caption generation [Mnih et al. 2014] [Xu et al. 2015]

Recurrent Attentio ion Model • Recurrent Neural Networks (RNN) 𝐳 𝑢−1 𝐳 𝑢+1 𝐗 ℎℎ 𝐳 𝑢 𝐗 𝑗ℎ 𝐗 ℎ𝑝 … … 𝐳 𝑢 𝐲 𝑢 𝐢 𝑢+1 𝐢 𝑢−1 𝐢 𝑢 𝐢 𝑢 𝐲 𝑢−1 𝐲 𝑢+1 𝐲 𝑢 Aggregate information

Vie iew-based observatio ion 𝑤 0 𝐽 (0) 𝜒 𝑢 𝑤 𝑢 𝜄 𝑢 𝐽 (t)

3D 3D Recurrent Attentio ion Model 𝜄 2 , 𝜚 (2) 𝜄 3 , 𝜚 (3) 𝜄 1 , 𝜚 (1) View NBV emission NBV emission selection … (2) (3) (1) ℎ 2 ℎ 2 ℎ 2 classify View classify classify … aggregation (2) (3) (1) ℎ 1 ℎ 1 ℎ 1 initial view 𝜄 1 , 𝜚 (1) 𝜄 2 , 𝜚 (2) 𝜄 0 , 𝜚 (0) Feature Feature Feature extraction extraction extraction 𝐽 (1) 𝐽 (2) 𝐽 (0)

3D 3D Recurrent Attentio ion Model 𝜄 2 , 𝜚 (2) 𝜄 3 , 𝜚 (3) 𝜄 1 , 𝜚 (1) CNN 1 Max-pooling NBV emission NBV emission View pooling ℓ 1 ℓ 2 CNN 1 … (2) (3) CNN 2 (1) ℎ 2 ℎ 2 ℎ 2 …… … … ℓ 𝐿 classify classify classify CNN 1 … Multi-View CNN [Su et al. 2015] (2) (3) (1) ℎ 1 ℎ 1 ℎ 1 initial view 𝜄 1 , 𝜚 (1) 𝜄 2 , 𝜚 (2) 𝜄 0 , 𝜚 (0) Feature Feature Feature extraction extraction extraction 𝐽 (1) 𝐽 (2) 𝐽 (0)

Network train inin ing Reinforcement CNN learning Back propagation 𝜄 𝑗 , 𝜚 (𝑗) 𝜄 𝑗 , 𝜚 (𝑗) 𝐽 (𝑗) 𝐽 (𝑗) rendering Indifferentiable

Rein inforcement le learnin ing agent Stop? Depth state acquisition action reward How good the depth is? environment

Reward 𝑠 𝑢 = 𝐼 𝑢 𝑞 𝑢 , 𝑞 + 𝐽 𝑢 𝑞 𝑢 , 𝑞 𝑢−1 − 𝐷 𝑢 prediction information movement accuracy gain cost

Part-le level attentio ion occlusion Informative parts How to distinguish these two chairs?

Attentio ion extractio ion Convolutional Neural Network … … … Mid-level …… kernels in CNN …

Attentio ion extractio ion One wing Two wings

Results and evaluation

Database 57,452 models 12,311 models 57 categories 40 categories 52 sampled views Render model 260 sampled views Render with jittering

Tim imin ing Database MV-RNN train MV-RNN test ShapeNet 49 hr. 0.1 sec. ModelNet40 22 hr. 0.1 sec.

Vis isuali lizatio ion of attentio ions Part-level attention View sequence View sequence

NBV estim imation 40 classes Classification Accuracy

NBV estim imation under occlu lusio ion Classification Accuracy …

Result lts on real l scenes

Lim imit itations • Recognizable objects • No contextual information

Future works: Mult lti-modal l recognit itio ion What is this? Image database Shape database

Future: Mult lti-robot scene reconstructio ion & understandin ing AscTec Pelican PR2 Turtlebot 40

Future: Mult lti-robot attentio ion model Attention based on shared internal representation? 41

Thank you Q & A More details: kevinkaixu.net & yifeishi.net

3D Attention-Driven Depth Acquisition for Object Identification Kai - PowerPoint PPT Presentation

3D Attention-Driven Depth Acquisition for Object Identification Kai Xu, Yifei Shi, Lintao Zheng, Junyu Zhang, Min Liu, Hui Huang, Hao Su, Daniel Cohen-Or and Baoquan Chen National University of Defense Technology Shandong University Shenzhen

Visual Attention FEF V4 spatial attention: simultaneous neural recordings in V4

Integrating Warfighter-Driven System of System Innovation into the Acquisition Life Cycle Ira

Clock-Driven Scheduling (in-depth) Pre-compute static schedule off-line (e.g. at design

Clock-Driven Scheduling (in-depth) Precompute static schedule off-line Task Scheduler: (e.g.

Agile SW Development plus Scrum in - depth Slide 1 Plan-driven and agile development

Comp/Phys/APSc 715 Object recognition, Surface shape, Texture, Depth cues, Stereo, Combinations

Comp/Phys/Mtsc 715 Object recognition, Surface shape, Texture, Depth cues, Stereo, Combinations

Neural Attention for Object Tracking Brian Cheung bcheung@berkeley.edu Redwood Center for

FASHION - LUXURY world s attention. His style-driven influence has already led to

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Object-based Layered Depth Images for improved virtual view synthesis in rate-constrained context

The Attention Economy What is the attention economy? A business model where you (as the

Scenario-Driven System Engineering (SDSE) for System of Systems Acquisition Ray Paul ASD NII C2

Segmentation Driven Object Detection with Introduction Fisher Vectors State of the art Method

Completing 3D Object Shape from One Depth Image (CVPR 2015) Jason Rock, Tanmay Gupta, Justin

Recall: OpenGL Image Space Approach Paint pixel with color of closest object for (each

I ntegrated Model-Driven Development Environments for Equation-Based Object-Oriented Languages

Construal Attention - Our mental filter: We are surrounded by numerous people, objects,

Neutrino-Driven Jets in Compact Object Mergers Oliver Just Max-Planck-Institut fr Astrophysik,

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Test-Driven Development (TDD) with JUnit EECS2030 B: Advanced Object Oriented Programming Fall

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

EXPERIMENTS WITH MODEL-DRIVEN DATA ACQUISITION FOR CROWDSENSING 7/27/2013 Phillip Dold Socially

7.2 Ray Tracing Hao Li http://cs420.hao-li.com 1 Motivation: Reflections 2 Motivation: Depth