CSE 152: Computer Vision Hao Su Lecture 10: Object Recognition How - PowerPoint PPT Presentation

CSE 152: Computer Vision Hao Su Lecture 10: Object Recognition

How do we represent objects - Bounding box Figures from https://github.com/facebookresearch/detectron2

How do we represent objects - Bounding box - Instance mask Figures from https://github.com/facebookresearch/ detectron2

How do we represent objects - Bounding box - Instance mask - Keypoint Figures from https://github.com/facebookresearch/ detectron2

Object Detection with Bounding Boxes What? - Recognition/ Classification Where? - Localization/ Regression Slides modified from Ross Girshick tutorial at CVPR 2019

Object Detection with Segmentation Masks What? - Recognition Where? - Segmentation Slides modified from Ross Girshick tutorial at CVPR 2019

Semantic Segmentation Predict a pixel-wise class label Stuff: walls, buildings, sky, road Things: human, cars, bikes Figures from Panoptic Segmentation , CVPR 2019

Datasets Microsoft COCO

Object Detection

Object Detection → Object Classification Enumerate / Crop and resize heuristic algorithm (warp) Input: an image Proposals/Candidates Cropped image We’ve already reduced object detection to object classification! Slides modified from Ross Girshick tutorial at CVPR 2019

R-CNN (Regional ConvNet) Computationally expensive Cropped image Region of Interests (RoI) Enumerate / heuristic algorithm ConvNet Input: an image Proposals/Candidates Class Probability How probable is it a human? BBox Regression How can we modify this bounding box? Slides modified from Ross Girshick tutorial at CVPR 2019

Faster R-CNN Proposals/Candidates Region of Interests (RoI) Input: an image Class Probability BBox Regression Region Proposal Network (RPN) ConvNet Multilayer Perceptron (MLP) ConvNet RoI-Pool Similar to Crop & Resize Feature map for an image Feature map for a RoI Slides modified from Ross Girshick tutorial at CVPR 2019

Faster R-CNN • At each location, consider boxes of many different sizes and aspect ratios

Object Segmentation

Semantic Segmentation Idea: Fully Convolutional Design a network as a bunch of convolutional layers to make predictions for pixels all at once! Conv Conv Conv Conv argmax Input: Predictio Score 3 x H x ns: H s: C x Convolutio W x W H x W ns: D x H x W Lecture May 10, 11 - 2017

Semantic Segmentation Idea: Fully Convolutional Design network as a bunch of convolutional layers, with downsampling and upsampling inside the network! Med-res: Med-res: D 2 x H/4 x W/4 D 2 x H/4 x W/4 Low- res: D 3 x Input: High- High- H/4 x W/4 Predictio 3 x H x res: D 1 x res: D 1 x ns: H W H/2 x W/2 H/2 x W/2 x W Lecture May 10, 11 - 2017 Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Semantic Segmentation Idea: Fully Convolutional Design network as a bunch of convolutional layers, with downsampling and upsampling inside the network! Downsampling : Upsampling : ??? Pooling, strided Med-res: Med-res: convolution D 2 x H/4 x W/4 D 2 x H/4 x W/4 Low- res: D 3 x Input: High- High- H/4 x W/4 Predictio 3 x H x res: D 1 x res: D 1 x ns: H W H/2 x W/2 H/2 x W/2 x W Lecture May 10, 11 - 2017 Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015 Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Learnable Upsampling: Transpose Convolution Sum where 3 x 3 transpose convolution, output overlaps stride 2 pad 1 Filter moves 2 pixels in the output for every one Input pixel in the input gives weight Stride gives ratio for filter between movement in output and input Input: 2 x 2 Output: 4 x 4

Learnable Upsampling: Transpose Convolution Sum where 3 x 3 transpose convolution, output overlaps stride 2 pad 1 Filter moves 2 pixels in the output for every one Input pixel in the input gives weight Stride gives ratio for filter between movement in output and input Input: 2 x 2 Output: 4 x 4 Other names: -Deconvolution (bad) -Upconvolution -Fractionally strided convolution -Backward strided convolution

Semantic vs. Instance Segmentation Slides modified from Ross Girshick tutorial at CVPR 2019

Mask R-CNN • First do object detection using the Faster R-CNN arch, and then do semantic segmentation inside the cropped region • Share features of the first few layers for detection and segmentation

CSE 152: Computer Vision Hao Su Lecture 10: Object Recognition How - PowerPoint PPT Presentation

CSE 152: Computer Vision Hao Su Lecture 10: Object Recognition How do we represent objects - Bounding box Figures from https://github.com/facebookresearch/detectron2 How do we represent objects - Bounding box - Instance mask Figures from

Area of Rectangles 2 Return to Table of Contents 3 Slide 7 / 152 Slide 8 / 152 Area of a

Decimal Addition Return to Table of Contents Slide 5 / 152 Place Value Chart Slide 6 / 152

Stereo Vision I Introduction to Computer Vision CSE 152 Lecture 13 CSE152, Spr 07 Intro

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Area of Rectangles MP6: Attend to precision. MP7: Look for & make use of structure. MP8:

CSE 152: Computer Vision Hao Su Filters and Features Diffuse reflection: Lamberts cosine law

CSE 152: Computer Vision Hao Su Lecture 7: Neural Networks Review of Filters: From Linear to

CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural Network and Learning Recap:

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

CS 152 Computer Architecture and Engineering Lecture 12: Multicycle Controller Design October 10,

How to Give a Bad Talk How to Give a Bad Talk Professor David A. Patterson Computer Science 152

Announcements Homework 1 is due Apr 24, 11:59 PM Homework 2 will be assigned this week

CSE 182-L2:Blast & variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 CSE 312 Final Review: Section AA

Selected results on heavy flavour physics at LHCb Matthew CHARLES (UPMC/LPNHE) 1 Plan

SURFsara NOC Flash talk Erik Ruiter, Sr. Network Specialist, SURFsara TF-NOC Meeting Cambridge

November 16, 2017 Gildas Avoine Loc Ferreira Rescuing LoRaWAN 1.0 Workshop CRYPTACUS 1

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

CNNs Applications M. Soleymani Sharif University of Technology Spring 2019 Most slides have

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

National School Lunch Program (NSLP) Equipment Assistance Grant Information Webinar Nov. 5,

Budget Recalibration P2I Meeting September 28, 2017 Agenda > 5 minutes: Welcome and Goals

CSE 152: Computer Vision Hao Su Lecture 10: Object Recognition How - PowerPoint PPT Presentation

CSE 152: Computer Vision Hao Su Lecture 10: Object Recognition How do we represent objects - Bounding box Figures from https://github.com/facebookresearch/detectron2 How do we represent objects - Bounding box - Instance mask Figures from

Area of Rectangles 2 Return to Table of Contents 3 Slide 7 / 152 Slide 8 / 152 Area of a

Decimal Addition Return to Table of Contents Slide 5 / 152 Place Value Chart Slide 6 / 152

Stereo Vision I Introduction to Computer Vision CSE 152 Lecture 13 CSE152, Spr 07 Intro

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Area of Rectangles MP6: Attend to precision. MP7: Look for &amp; make use of structure. MP8:

CSE 152: Computer Vision Hao Su Filters and Features Diffuse reflection: Lamberts cosine law

CSE 152: Computer Vision Hao Su Lecture 7: Neural Networks Review of Filters: From Linear to

CSE 152: Computer Vision Hao Su Lecture 9: Convolutional Neural Network and Learning Recap:

CSE 3401 Functional and Logic Programming York University CSE 3401 Vida Movahedi 1 York University

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

CS 152 Computer Architecture and Engineering Lecture 12: Multicycle Controller Design October 10,

How to Give a Bad Talk How to Give a Bad Talk Professor David A. Patterson Computer Science 152

Announcements Homework 1 is due Apr 24, 11:59 PM Homework 2 will be assigned this week

CSE 182-L2:Blast &amp; variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 CSE 312 Final Review: Section AA

Selected results on heavy flavour physics at LHCb Matthew CHARLES (UPMC/LPNHE) 1 Plan

SURFsara NOC Flash talk Erik Ruiter, Sr. Network Specialist, SURFsara TF-NOC Meeting Cambridge

November 16, 2017 Gildas Avoine Loc Ferreira Rescuing LoRaWAN 1.0 Workshop CRYPTACUS 1

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

CNNs Applications M. Soleymani Sharif University of Technology Spring 2019 Most slides have

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

National School Lunch Program (NSLP) Equipment Assistance Grant Information Webinar Nov. 5,

Budget Recalibration P2I Meeting September 28, 2017 Agenda &gt; 5 minutes: Welcome and Goals

Area of Rectangles MP6: Attend to precision. MP7: Look for & make use of structure. MP8:

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

CSE 182-L2:Blast & variants I Dynamic Programming www.cse cse. .ucsd ucsd. .edu

Budget Recalibration P2I Meeting September 28, 2017 Agenda > 5 minutes: Welcome and Goals