CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen - PowerPoint PPT Presentation

CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen and Shan-Hung Wu CS Department, National Tsing-Hua University Taiwan, R.O.C. wdchen@datalab.cs.nthu.edu.tw, shwu@cs.nthu.edu.tw

On Generalizability of CNNs • The Convolutional Neural Networks (CNNs) have laid the foundation for many techniques in various applications • However, the 3D viewpoint generalizability of CNNs is still far behind human’s visual capabilities 2

3D Viewpoint Generalizability Train Test • Humans can recognize objects at unseen angles • But CNNs cannot 3

Outline • Related work • CNN^2 • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling • Experiments 4

Voxel-Reconstruction Methods • E.g., the Perspective Transformer Networks (PTNs) by Yan et al. 16 • Learn 3D models directly 5

Cons • Require either • Voxel-level supervision, or • Omnidirectional images as input • Both are expensive to collect in practice 6

CapsuleNets (Hinton et al. 17, 18) • Different capsules are organized in a parse tree where lower-level capsules are dynamically routed to upper- level capsules using an agreement protocol • When viewpoint changes, the “routes” will change in a coordinate way 7

But… • People found that CapsuleNets are hard to train • Capsules increase the number of model parameters • Iterative routing-by-agreement algorithm is time- consuming • Does not ensure the emergence of a correct parse tree (Peer et al. 18) • Not compatible with CNNs • and therefore cannot benefit the rich CNN ecosystem 8

Our Goals • A new model that • has improved 3D viewpoint generalizability • does not require expensive input and supervision • is CNN compatible 10

Ob Observation: Hu Huma mans under erstand the e world using g tw two eyes! 11

Binocular Images • Today, binocular images can be easily collected • Majority of people are using their smartphones, which are now usually equipped with dual or more lens • One can also extract two nearby frames in online videos to construct a large binocular image dataset 12

Binocular Solution 1 (LeCun et al. 14) merge Classifier Pooling Pooling Pooling Conv Conv Conv • Stacks up two binocular images along the channel dimension and then feeds them to a regular CNN • But don’t model any prior of binocular vision 13

Binocular Solution 2: Sol. 1 + Monodepth (Godard et al. 17) • Calculate the depth map explicitly, then add it as additional input channels 14

However… • The depth information is only a subset of the knowledge that can be learned from binocular vision • Studies in neuroscience have found out that human’s visual system can detect • Stereoscopic edges (Von Der Heydt et al. 00) • Foreground and background (Qiu andVon Der Heydt 05; Maruko et al. 08) • Illusory contours of objects (Von der Heydt et al. 1984; Anzai et al. 07) 15

• Concentric Multiscale (CM) pooling • Dual parallax augmentation • Dual feedforward pathways Our Solution: CNN^2 augment CM Pooling CM Pooling Conv Conv augment augment CM Pooling CM Pooling Conv Conv augment augment CM Pooling CM Pooling Conv Conv add Classifier 16

Dual Feedforward Pathways Optic Nerve CM Pooling CM Pooling CM Pooling augment augment Conv Conv Conv Optic Chiasm augment Classifier add Lateral Geniculate Nucleus (LGN) CM Pooling CM Pooling CM Pooling Conv Conv Conv augment augment Visual Cortex System • Humans visual system at left and right sides of the brain are known to have bias (Gotts et al. 13) • Filters/kernels in the left and right pathways can learn different (biased) features 18

Outline • Related work • CNN^2 • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling • Experiments CM Pooling CM Pooling CM Pooling augment augment Conv Conv Conv augment Classifier add CM Pooling CM Pooling CM Pooling Conv Conv Conv augment augment 19

<latexit sha1_base64="1QbcNEVtgJMj3gvVYG2N5kn6JlQ=">ACAnicbVDLSsNAFJ3UV62vqCIm8EiuApJi9TuSt24bMU+oA1hMpm0QycPZiZCcWNv+JGUBG3Lv0Cd278FietiFoPDHM4517uvceNGRXSN+13MLi0vJKfrWwtr6xuaVv7RFlHBMWjhiEe+6SBGQ9KSVDLSjTlBgctIx2dZX7ninBo/BSjmNiB2gQUp9iJXk6Pt9SZlH0r4bMU+MA/Wlw8nEuSg4etE0quVq2arAjJwoQMswp/gmxdpe84Pe18bjv7W9yKcBCSUmCEhepYZSztFXFLMyKTQTwSJER6hAekpGqKACDudnjCBR0rxoB9x9UIJp+rPjhQFIltPVQZIDsVfLxP/83qJ9E/tlIZxIkmIZ4P8hEZwSwP6FOsGRjRDmVO0K8RBxhKVKLQth7uR50i4ZVtkoNa1irQ5myIMDcAiOgQUqoAbOQO0AbX4BY8gEftRrvTnrTnWlO+rZBb+gvXwCJIabnA=</latexit> <latexit sha1_base64="YV+Kac5N12c5cD+YnxbVjl1Tmlk=">ACAnicbVDLSsNAFJ3UV62vqCIm8EiuApJi9TuSt24cNGCfUAbwmQyaYdOHsxMhBKG3/FjaAibl36Be7c+C1OWhG1HhjmcM693HuPGzMqpGm+a7mFxaXlfxqYW19Y3NL395piyjhmLRwxCLedZEgjIakJalkpBtzgKXkY47Osv8zhXhgkbhpRzHxA7QIKQ+xUgqydH3+5Iyj6R9N2KeGAfqS4eTiXNRcPSiaVTL1bJVgRk5UYCWYU7xTYq1veYHva+/Nhz9re9FOAlIKDFDQvQsM5Z2irikmJFJoZ8IEiM8QgPSUzREARF2Oj1hAo+U4kE/4uqFEk7Vnx0pCkS2nqoMkByKv14m/uf1Eumf2ikN40SEM8G+QmDMoJZHtCjnGDJxogzKnaFeIh4ghLlVoWwtzJ86RdMqyUWpaxVodzJAHB+AQHAMLVEANnIMGaAEMrsEteACP2o12pz1pz7PSnPbVswt+QXv5Btom5Y=</latexit> Dual Parallax Augmentation (1/2) Left path: W X H X C W X H X C - ) = W X H X 2C ( W X H X C concat h L h R ˜ h L h L Right path: W X H X C W X H X C - ) = ( W X H X 2C W X H X C concat h R h L ˜ h R h R 20

Dual Parallax Augmentation (2/2) • Allows the filters/kernels in convolutional layers to recursively detect stereoscopic features at different abstraction levels by looking into the parallax • The small differences between the two input images at the pixel level and at shallow layers may add up to a big difference at a deeper layer 21

Outline • Related work • CNN^2 • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling • Experiments CM Pooling CM Pooling CM Pooling augment augment Conv Conv Conv augment Classifier add CM Pooling CM Pooling CM Pooling Conv Conv Conv augment augment 22

Concentric Multiscale (CM) Pooling (1/2) • Areas that are out of focus are blurred 23

Concentric Multiscale (CM) Pooling (2/2) Scan Avg. Pool 24

Before Convolution Placed Bef • Allows filters/kernels to contrast blurry features with clear features 25

Datasets • ModelNet2D (gray scale) • SmallNORB (gray scale) • RGBD-Object (RGB) 27

Train/Test Setting 28

3D Viewpoint Generalization 29

Learning Efficiency 30

Backward Compatibility • CNN^2, by default, does not generalize to 2D rotated images • But can be enhanced by existing works on 2D rotation generalizability 31

Takwaways • We propose CNN^2 that • gives improved 3D viewpoint generalizability • does not require expensive input or supervision • is compatible with CNNs and can benefit the rich CNN ecosystem • Detects stereoscopic features beyond depth via: • Dual feedforward pathways • Dual parallax augmentation • Concentric Multiscale (CM) pooling from binocular images 32

CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen - PowerPoint PPT Presentation

CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen and Shan-Hung Wu CS Department, National Tsing-Hua University Taiwan, R.O.C. wdchen@datalab.cs.nthu.edu.tw, shwu@cs.nthu.edu.tw On Generalizability of CNNs The Convolutional

EFFECTS OF REFRACTIVE SURGERY ON BINOCULAR VISION : PRE- AND POST- LASIK BINOCULAR VISION : PRE-

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

Clavicle Opposable thumbs Fingernails Binocular & colour vision Generalised

Photographing Long Scenes with Multi-Viewpoint Panoramas Agarwala, M. Agrawala, M. Cohen, D.

Multiple Viewpoint Systems Harmonizations Raymond Whorley, The Construction and Evaluation of

Render for CNN: Viewpoint Estimation in Images Using CNNsTrained with Rendered 3D Model Views Hao

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

A Whirlwind Tour of where we are in Computational Binocular Stereo Vision a beginners tutorial

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Todays class

Local Substitutability for Sequence Generalization Fran cois Coste , Ga elle Garet , Jacques

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

depth from X: defocus blur and binocular disparity Tues. Feb. 13, 2018 1 Depth from defocus

I Graphical Representation of Maddox components II. Clinical tests for each Maddox component III.

Depth II + Motion I Lecture 13 (Chapters 6+8) Jonathan Pillow Sensation & Perception (PSY

Depth Perception, part II Lecture 13 (Chapter 6) Jonathan Pillow Sensation & Perception (PSY

Binocular Stereo Take 2 images from different known viewpoints 1 st calibrate

What Happens When I Still See Double! You Cover One Eye? 1 2/13/2015 Eject! Journal of

2018-02-05 Depth Perception PSY 525.001 Vision Science 2018 Spring Rick Gilmore

CS 4495 Computer Vision Stereo: Disparity and Matching Aaron Bobick School of Interactive

CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen - PowerPoint PPT Presentation

CNN^2: Viewpoint Generalization via a Binocular Vision Wei-Da Chen and Shan-Hung Wu CS Department, National Tsing-Hua University Taiwan, R.O.C. wdchen@datalab.cs.nthu.edu.tw, shwu@cs.nthu.edu.tw On Generalizability of CNNs The Convolutional

EFFECTS OF REFRACTIVE SURGERY ON BINOCULAR VISION : PRE- AND POST- LASIK BINOCULAR VISION : PRE-

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

Clavicle Opposable thumbs Fingernails Binocular &amp; colour vision Generalised

Photographing Long Scenes with Multi-Viewpoint Panoramas Agarwala, M. Agrawala, M. Cohen, D.

Multiple Viewpoint Systems Harmonizations Raymond Whorley, The Construction and Evaluation of

Render for CNN: Viewpoint Estimation in Images Using CNNsTrained with Rendered 3D Model Views Hao

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

A Whirlwind Tour of where we are in Computational Binocular Stereo Vision a beginners tutorial

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Todays class

Local Substitutability for Sequence Generalization Fran cois Coste , Ga elle Garet , Jacques

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

depth from X: defocus blur and binocular disparity Tues. Feb. 13, 2018 1 Depth from defocus

I Graphical Representation of Maddox components II. Clinical tests for each Maddox component III.

Depth II + Motion I Lecture 13 (Chapters 6+8) Jonathan Pillow Sensation &amp; Perception (PSY

Depth Perception, part II Lecture 13 (Chapter 6) Jonathan Pillow Sensation &amp; Perception (PSY

Binocular Stereo Take 2 images from different known viewpoints 1 st calibrate

What Happens When I Still See Double! You Cover One Eye? 1 2/13/2015 Eject! Journal of

2018-02-05 Depth Perception PSY 525.001 Vision Science 2018 Spring Rick Gilmore

CS 4495 Computer Vision Stereo: Disparity and Matching Aaron Bobick School of Interactive

Clavicle Opposable thumbs Fingernails Binocular & colour vision Generalised

Depth II + Motion I Lecture 13 (Chapters 6+8) Jonathan Pillow Sensation & Perception (PSY

Depth Perception, part II Lecture 13 (Chapter 6) Jonathan Pillow Sensation & Perception (PSY