Detection and Segmentation CS60010: Deep Learning Abir Das IIT Kharagpur Feb 28, 2020
Introduction Datasets Localization Agenda To get introduced to two important tasks of computer vision - detection and segmentation along with deep neural network’s application in these areas in recent years. Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 2 / 38
Introduction Datasets Localization From Classification to Detection Classification Detection Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 3 / 38
Introduction Datasets Localization Challenges of Object Detection § Simultaneous recognition and localization § Images may contain objects from more than one class and multiple instances of the same class § Evaluation Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 4 / 38
Introduction Datasets Localization Localization and Detection Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 5 / 38
Introduction Datasets Localization Evaluation § At test time 3 things are predicted:- Bounding box coordinates, Bounding box class label, Confidence score § Performance is measured in terms of IoU (Intersection over Union) § According to PASCAL criterion, ◮ a detection is correct if IoU > 0.5 ◮ For multiple detections only one is considered true positive Image Source Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 6 / 38
Introduction Datasets Localization Evaluation: Precision-Recall tp § precision = tp + fp tp § recall = tp + fn Image Source Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 7 / 38
Introduction Datasets Localization Evaluation: Average Precision Lets consider an image with 5 apples where our detector provides 10 detections. Source: This medium post Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 8 / 38
Introduction Datasets Localization Evaluation: Average Precision Area under curve is a measure of performance. This gives the average precision of the detector. Source: This medium post Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 9 / 38
Introduction Datasets Localization Evaluation: mean Average Precision A little more detail: § The curve is made smooth from the zigzag pattern by finding the highest precision value at or to the right side of the recall values. § Then the average is taken for 11 recall values (0, 0.1, 0.2, ... 1.0) - Average Precison (AP) § The mean average precision (mAP) is the mean of the average precisions (AP) for all classes of objects. Source: This medium post Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 10 / 38
Introduction Datasets Localization Non-max Suppression What to do if there are multiple detections of the same object? Can you think its effect on precision-recall? 0.6 0.8 0.9 0.7 0.7 Source: deeplearning.ai Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 11 / 38
Introduction Datasets Localization Non-max Suppression § Sort the predictions by the confidence scores § Starting with the top score prediction, ignore any other prediction of the same class and high overlap ( e.g. , IoU > 0.5) with the top ranked prediction § Repeat the above step until all predictions are checked 0.6 0.8 0.9 0.7 0.7 Source: deeplearning.ai Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 12 / 38
Introduction Datasets Localization Segmentation Other Computer Vision Tasks Semantic Instance Semantic Classification Object Segmentation Segmentation Segmentation + Localization Detection GRASS , CAT , GRASS , CAT , CAT DOG , DOG , CAT DOG , DOG , CAT TREE , SKY TREE , SKY Source: cs231n course, Stanford University No objects, just pixels Multiple Object No objects, just pixels Single Object Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 13 / 38 This image is CC0 public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 8 May 10, 2018
Introduction Datasets Localization PASCAL VOC § Dataset size (by 2012): 11.5K training/val images, 27K bounding boxes, 7K segmentations Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 14 / 38
Introduction Datasets Localization PASCAL VOC Object%detection%renaissance% (2013'present) 80% PASCAL$VOC 70% mean0Average0Precision0(mAP) 60% Before$deep$convnets RHCNNv1 50% 40% Using$deep$convnets 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Source: ICCV ’15, Fast R-CNN Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 15 / 38
Introduction Datasets Localization COCO Dataset Source: http://cocodataset.org Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 16 / 38
Introduction Datasets Localization COCO Tasks Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 17 / 38
Introduction Datasets Localization Classification + Localization Classification + Localization: Task Classification : C classes Input: Image CAT Output: Class label Evaluation metric: Accuracy Localization : Input: Image (x, y, w, h) Output : Box in the image (x, y, w, h) Evaluation metric: Intersection over Union Classification + Localization : Do both Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - Lecture 8 - 10 1 Feb 2016 1 Feb 2016 Source: cs231n course, Stanford University Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 18 / 38
Introduction Datasets Localization Classification + Localization Idea #1: Localization as Regression Input : image Neural Net Output : Box coordinates (4 numbers) Loss : L2 distance Correct output : box coordinates Only one object, (4 numbers) simpler than detection Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - Lecture 8 - 12 1 Feb 2016 1 Feb 2016 Source: cs231n course, Stanford University Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 19 / 38
Introduction Datasets Localization Classification + Localization Simple Recipe for Classification + Localization Step 1 : Train (or download) a classification model (AlexNet, VGG, GoogLeNet) Convolution Fully-connected and Pooling layers Softmax loss Final conv Class scores feature map Image Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - Lecture 8 - 13 1 Feb 2016 1 Feb 2016 Source: cs231n course, Stanford University Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 20 / 38
Introduction Datasets Localization Classification + Localization Simple Recipe for Classification + Localization Step 2 : Attach new fully-connected “regression head” to the network Fully-connected layers “Classification head” Convolution Class scores and Pooling Fully-connected layers “Regression head” Final conv feature map Box coordinates Image Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - Lecture 8 - 1 Feb 2016 1 Feb 2016 14 Source: cs231n course, Stanford University Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 21 / 38
Introduction Datasets Localization Classification + Localization Simple Recipe for Classification + Localization Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers Convolution Class scores and Pooling Fully-connected layers L2 loss Final conv Box coordinates feature map Image Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - Lecture 8 - 15 1 Feb 2016 1 Feb 2016 Source: cs231n course, Stanford University Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 22 / 38
Introduction Datasets Localization Classification + Localization Simple Recipe for Classification + Localization Step 4 : At test time use both heads Fully-connected layers Convolution Class scores and Pooling Fully-connected layers Final conv feature map Box coordinates Image Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - Lecture 8 - 1 Feb 2016 1 Feb 2016 16 Source: cs231n course, Stanford University Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 23 / 38
Introduction Datasets Localization Classification + Localization Aside: Localizing multiple objects Want to localize exactly K objects in each image Fully-connected layers (e.g. whole cat, cat head, cat left ear, cat right ear for K=4) Convolution Class scores and Pooling Fully-connected layers K x 4 numbers (one box per object) Final conv feature map Box coordinates Image Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - Lecture 8 - 1 Feb 2016 1 Feb 2016 19 Source: cs231n course, Stanford University Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 24 / 38
Introduction Datasets Localization Classification + Localization Aside: Human Pose Estimation Represent a person by K joints Regress (x, y) for each joint from last fully-connected layer of AlexNet (Details: Normalized coordinates, iterative refinement) Toshev and Szegedy, “DeepPose: Human Pose Estimation via Deep Neural Networks”, CVPR 2014 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - Lecture 8 - 20 1 Feb 2016 1 Feb 2016 Source: cs231n course, Stanford University Abir Das (IIT Kharagpur) CS60010 Feb 28, 2020 25 / 38
Recommend
More recommend