Render for CNN: Viewpoint Estimation in Images Using CNNsTrained - PowerPoint PPT Presentation

Render for CNN: Viewpoint Estimation in Images Using CNNsTrained with Rendered 3D Model Views Hao Su * Charles R. Qi * Yangyan Li Leonidas J. Guibas

ILSVRC Image ClassificationTop-5 Error (%) 28.2 30 25.8 25 Top-5 Error (%) 20 16.4 15 11.7 10 6.7 3.6 5 0 2010 2011 2012 2013 2014 2015

Go beyond 2D Image Classification car • 3D bounding box • 3D alignment • 3D model retrieval

Go beyond 2D Image Classification 3D Viewpoint Estimation car in-plane rotation elevation azimuth

3DViewpoint Estimation in theWild Images in the Wild Models unknown

3D Perception in theWild Learn from Data Images in the Wild Models unknown AlexNet [Krizhevsky et al.] AlexNet [Krizhevsky et al.]

However.. Accurate Label Acquisition is Expensive What’s the camera viewpoint angles to the SUV in the image?

However.. Accurate Label Acquisition is Expensive PASCAL3D+ dataset [Xiang et al.]

However.. Accurate Label Acquisition is Expensive Step1: PASCAL3D+ dataset [Xiang et al.] Choose similar model

However.. Accurate Label Acquisition is Expensive Step1: PASCAL3D+ dataset [Xiang et al.] Choose similar model Step2: Coarse Viewpoint Labeling

However.. Accurate Label Acquisition is Expensive Step1: PASCAL3D+ dataset [Xiang et al.] Choose similar model Step2: Coarse Viewpoint Labeling Annotation takes ~1 min per object Step3: Label keypoints For alignment

High-capacity Model High-cost Label Acquisition 30K images with viewpoint labels in PASCAL3D+ 60M parameters. AlexNet [Krizhevsky et al.] dataset [Xiang et al.] How to get MORE images with ACCURATE viewpoint labels?

Manual alignment by annotators Auto alignment through rendering

Good News: ShapeNet 3M models in total 330K from 4K categories annotated 1,000 #models per class ShapeNet (going on) ModelNet 15’ 100 http://shapenet.cs.stanford.edu SHREC 14’ 10 PSB 05’ 1,000 1,000,000 #total models

Key Idea: Render for CNN Training Viewpoint Rendering ShapeNet Synthetic Images

Key Idea: Render for CNN Testing Viewpoint Real Images

I want data! How to render data with both quantity and quality? Rendering

Synthesize: Scalability vs Quality Ideal High Quality Low Low High Scalability

Synthesize: Scalability vs Quality Ideal High Previous Quality works Low Low High Scalability

Synthesize: Scalability vs Quality Ideal High Sweet spot Previous Quality works Low Low High Scalability

Synthesize: Scalability vs Quality Ideal High Sweet spot Previous Quality works Story Time! Low Low High Scalability

A “Data Engineering” Journey • 80K rendered chair images • Metric: 16-view classification accuracy tested on real images At beginning.. • Lighting: 4 fixed point light sources on the sphere • Background: clean

A “Data Engineering” Journey 95% on synthetic val set 47% on real test set  ConvNet: Ah ha, I know! Viewpoint is just the brightness pattern!

A “Data Engineering” Journey Randomize lighting 47% -> 74% ConvNet: hmm.. viewpoint is not the brightness pattern. Maybe it’s the contour?

A “Data Engineering” Journey Randomize lighting 47% -> 74% ConvNet: hmm.. viewpoint is not the brightness pattern. Maybe it’s the contour ?

A “Data Engineering” Journey Add backgrounds 74% -> 86% ConvNet: It becomes really hard! Let me look more into the picture.

A “Data Engineering” Journey bbox crop texture 86% -> 93%

A “Data Engineering” Journey bbox crop texture 86% -> 93% ConvNet: the mapping becomes hard. I have to Key Lesson: Don ’ t give CNN a chance to “cheat” - it’s very good learn harder to get it right! at it. When there is no way to cheat, true learning starts.

Render for CNN Image Synthesis Pipeline Add bkg Rendering Crop 3D model Sample lighting and Sample bkg. Image Sample cropping camera params Alpha-blending params Hyper-parameters estimation from real images

Render for CNN Image Synthesis Pipeline Rendering 3D model Sample lighting and camera params

Rendering Lighting params Camera params KDE from Randomly sampled PASCAL3D+ train set • Number of light sources • Light distances • Light energies • Light positions • Light types

Render for CNN Image Synthesis Pipeline Add bkg Rendering 3D model Sample lighting and Sample bkg. Image camera params Alpha-blending

Background Composition • Simple but effective! • Backgrounds randomly sampled from SUN397 dataset [Xiao et al.] • Alpha blending composition for natural boundaries

Render for CNN Image Synthesis Pipeline Add bkg Rendering Crop 3D model Sample lighting and Sample bkg. Image Sample cropping camera params Alpha-blending params

Image Cropping Cropping patterns KDE from PASCAL3D+ train set

2.4M Synthesized Images for 12 Categories • High scalability • High quality • Overfit-resistant • Accurate labels

Results

3D Viewpoint Estimation Evaluation Metric: median angle error (lower the better) Real test images from PASCAL3D+ dataset

3D Viewpoint Estimation Evaluation Metric: viewpoint accuracy and median angle error (lower the better) Our model trained on rendered images outperforms state-of-the-art model trained on real images in PASCAL3D+. Real test images from PASCAL3D+ dataset 16 Viewpoint Median Error 15 14 13 12 11 10 9 8 Vps&Kps RenderForCNN (CVPR15) (Ours)

How many 3D models are necessary? 90 85 80 Accuracy 75 10 vs 1000 models 70 20% + difference 65 60 55 10 91 1000 6928 #models (for one category)

3D Viewpoint Estimation

Azimuth Viewpoint Estimation bicycle airplane bicycle 180 270 90 0 0 90 180 270 360 0 90 180 270 360 0 90 180 270 360 boat motorbike car Ground truth view Estimated view confidence 0 90 180 270 360

Azimuth Viewpoint Estimation chair table chair 180 270 90 0 monitor sofa sofa Ground truth view Estimated view confidence

Failure Cases sofa occluded by people ambiguous car viewpoint multiple cars car occluded by motorbike multiple chairs ambiguous chair viewpoint

Limitations of Current Synthesis Pipeline • Modeling Occlusions? • Modeling Background Context? • Shape database augmentation by interpolation?

Render for CNN – BeyondViewpoint • 3D model retrieval • Joint Embedding [Li et al sigasia15] • Object detection • Segmentation • Intrinsic image decomposition • Controlled experiments for DL • Vision algorithm verification

Conclusion Images rendered from 3D models can be effectively used to train CNNs, especially for 3D tasks. State-of-the-art result has been achieved. Keys to success • Quantity: Large scale 3D model collection (ShapeNet) • Quality: Overfit-resistant, scalable image synthesis pipeline http://shapenet.cs.stanford.edu

THE END THANK YOU!

Render for CNN: Viewpoint Estimation in Images Using CNNsTrained - PowerPoint PPT Presentation

Render for CNN: Viewpoint Estimation in Images Using CNNsTrained with Rendered 3D Model Views Hao Su * Charles R. Qi * Yangyan Li Leonidas J. Guibas ILSVRC Image ClassificationTop-5 Error (%) 28.2 30 25.8 25 Top-5 Error (%) 20 16.4 15

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

Aha! Understanding and Using Render Arrays in Drupal 8 Gus Childs @guschilds chromatichq.com

Screenspace Effects Introduction General idea: Render all data necessary into textures Render

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

M E E T S the GPU Agenda - Maxwell Render overview - Maxwell for the GPU -Why? -Why now?

Painters HSR Algorithm Render polygons farthest to nearest Similar to painter layers oil

Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN Gang Yu

Dynamic Graph CNN for learning on point clouds Wang Yue, et al. Otakar Jaek March 25, 2019

A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task Danqi Chen, Jason Bolton

Intro to Feature Representation in Virtual Screening Shengchao Liu, Gitter Group Feature

Lecture 19: Generative Models, Part 1 Justin Johnson November 20, 2019 Lecture 19 - 1 Last

CSSE463: Image Recognition Day 17 Today: Bayesian classifiers Tomorrow: Lightning talks

CSSE463: Image Recognition Day 31 Today: Bayesian classifiers Questions? Bayesian

Disjunctive cuts in branch-and-but-and-price algorithms Application to the capacitated vehicle

Hidden Markov Models Hsin-min Wang References: 1. L. R. Rabiner and B. H. Juang, (1993)

Announcement Slides for Worship 8/23/2020 Slide 5 Thanks to everyone who came to the Elevate

Genetic features of volcanic glasses of different composition Vladislav Petrov vlad243@mail.ru

INF4820 Algorithms for AI and NLP Semantic Spaces Murhaf Fares & Stephan Oepen

Graph & Geometry Problems in Data Streams 2009 Barbados Workshop on Computational Complexity

Render for CNN: Viewpoint Estimation in Images Using CNNsTrained - PowerPoint PPT Presentation

Render for CNN: Viewpoint Estimation in Images Using CNNsTrained with Rendered 3D Model Views Hao Su * Charles R. Qi * Yangyan Li Leonidas J. Guibas ILSVRC Image ClassificationTop-5 Error (%) 28.2 30 25.8 25 Top-5 Error (%) 20 16.4 15

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

Aha! Understanding and Using Render Arrays in Drupal 8 Gus Childs @guschilds chromatichq.com

Screenspace Effects Introduction General idea: Render all data necessary into textures Render

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

M E E T S the GPU Agenda - Maxwell Render overview - Maxwell for the GPU -Why? -Why now?

Painters HSR Algorithm Render polygons farthest to nearest Similar to painter layers oil

Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN Gang Yu

Dynamic Graph CNN for learning on point clouds Wang Yue, et al. Otakar Jaek March 25, 2019

A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task Danqi Chen, Jason Bolton

Intro to Feature Representation in Virtual Screening Shengchao Liu, Gitter Group Feature

Lecture 19: Generative Models, Part 1 Justin Johnson November 20, 2019 Lecture 19 - 1 Last

CSSE463: Image Recognition Day 17 Today: Bayesian classifiers Tomorrow: Lightning talks

CSSE463: Image Recognition Day 31 Today: Bayesian classifiers Questions? Bayesian

Disjunctive cuts in branch-and-but-and-price algorithms Application to the capacitated vehicle

Hidden Markov Models Hsin-min Wang References: 1. L. R. Rabiner and B. H. Juang, (1993)

Announcement Slides for Worship 8/23/2020 Slide 5 Thanks to everyone who came to the Elevate

Genetic features of volcanic glasses of different composition Vladislav Petrov vlad243@mail.ru

INF4820 Algorithms for AI and NLP Semantic Spaces Murhaf Fares &amp; Stephan Oepen

Graph &amp; Geometry Problems in Data Streams 2009 Barbados Workshop on Computational Complexity

INF4820 Algorithms for AI and NLP Semantic Spaces Murhaf Fares & Stephan Oepen

Graph & Geometry Problems in Data Streams 2009 Barbados Workshop on Computational Complexity