Ins Instanc nce segm segmen enta tati tion on CV3DST | Prof. - PowerPoint PPT Presentation

YOLACT: id idea 2) Generate mask coefficients 1) Generate mask prototypes CV3DST | Prof. Leal-Taixé 45

YOLACT: id idea 2) Generate mask coefficients 3) Combine (1) and (2) 1) Generate mask prototypes CV3DST | Prof. Leal-Taixé 46

YO YOLA LACT: : bac ackbone Features computed in different scales ResNet-101 CV3DST | Prof. Leal-Taixé 47

YO YOLA LACT: : pr protonet Generate k prototype masks. k is not the number of classes, but is a hyperparameter. CV3DST | Prof. Leal-Taixé 48

YO YOLA LACT: : pr protonet • Fully convolutional network 3x3 conv Similar to the mask branch in Mask R-CNN. However, no loss function is applied on this stage. 1x1 conv CV3DST | Prof. Leal-Taixé 49

YOLACT: mask coeffic icie ients Predict a coefficient for every predicted mask. CV3DST | Prof. Leal-Taixé 50

YOLACT: mask coeffic icie ients Predict one class per anchor box Predict the regression per anchor box Predict k coefficients (one per prototype mask) per anchor The network is similar but shallower than RetinaNet CV3DST | Prof. Leal-Taixé 51

YO YOLA LACT: : mas ask as assembly 1. Do a linear combination between the mask coefficients and the mask prototypes. Predict the mask as M = 𝜏 ( 𝑄𝐷 ! ) where P is a (HxWxK) 2. matrix of prototype masks, C is a (NxK) matrix of mask coefficients surviving NMS, and 𝜏 is a nonlinearity. CV3DST | Prof. Leal-Taixé 52

YOLACT: loss functio ion Cross-entropy between the assembled masks and the ground truth, in addition to the standard losses (regression for the bounding box, and classification for the class of the object/mask). CV3DST | Prof. Leal-Taixé 53

YO YOLACT: q : qualit itativ ive r resu sults CV3DST | Prof. Leal-Taixé 54

YO YOLACT: q : qualit itativ ive r resu sults For large objects, the quality of the masks is even better than those of two- stage detectors CV3DST | Prof. Leal-Taixé 55

So, , which se segmenter to to use? YOLACT CV3DST | Prof. Leal-Taixé 56

YOLACT: im improvements • A specially designed version of NMS, in order to make the procedure faster. • An auxiliary semantic segmentation loss function performed on the final features of the FPN. The module is not used during the inference stage. • D. Boyla et al. “YOLACT++: Better real-time instance segmentation”. arXiv:1912.06218 2019 CV3DST | Prof. Leal-Taixé 57

Pa Pano noptic ic segm segmen enta tati tion on CV3DST | Prof. Leal-Taixé 58

Panopt Pa ptic c segm gmentation Semantic Instance segmentation segmentation + CV3DST | Prof. Leal-Taixé 59

Panopt Pa ptic c segm gmentation Semantic Instance segmentation segmentation + FCN-like Mask R-CNN CV3DST | Prof. Leal-Taixé 60

Panopt Pa ptic c segm gmentation Semantic Instance Panoptic segmentation segmentation segmentation = + FCN-like Mask R-CNN UPSNet CV3DST | Prof. Leal-Taixé 61

Pa Panopt ptic c segm gmentation It gives labels to uncountable objects called "stuff" (sky, road, etc), similar to FCN-like networks. It differentiates between pixels coming from different instances of the same class (countable objects) called "things" (cars, pedestrians, etc). CV3DST | Prof. Leal-Taixé 62

Pa Panopt ptic c segm gmentation Problem: some pixels might get classified as stuff from FCN network, while at the same time being classified as instances of some class from Mask R-CNN (conflicting results)! CV3DST | Prof. Leal-Taixé 63

Pa Panopt ptic c segm gmentation Solution: Parametric-free panoptic head which combines the information from the FCN and Mask R-CNN, giving final predictions. Xiong et al., “UPSNet: A Unified Panoptic Segmentation Network”. CVPR 2019 CV3DST | Prof. Leal-Taixé 64

Ne Network rk a arc rchitecture ure Putting it Shared features Separate heads together CV3DST | Prof. Leal-Taixé 65

Ne Network rk a arc rchitecture ure Putting it Shared features Separate heads together CV3DST | Prof. Leal-Taixé 66

The The semanti ntic c he head As all semantic heads à fully convolutional network. New: deformable convolutions! CV3DST | Prof. Leal-Taixé 67

Re Reca call: Di Dilated (at atrous) ) con onvol olution ions 2D (a) the dilation parameter (b) the dilation (c ) the dilation is 1, and each element parameter is 2, and parameter is 4, and produced by this filter each element each element has receptive field of 3x3. produced by it has produced by it has receptive field of 7x7. receptive field of 15x15. CV3DST | Prof. Leal-Taixé 68

De Deformable ble co convo volu luti tions Deformable convolutions: generalization of dilated convolutions when you learn the offset CV3DST | Prof. Leal-Taixé 69

De Deformable ble co convo volu luti tions CV3DST | Prof. Leal-Taixé 70

De Deformable ble co convo volu luti tions The deformable convolution will pick the values at different locations for convolutions conditioned on the input image of the feature maps. CV3DST | Prof. Leal-Taixé 71

The The Pano nopti tic c he head Mask logits from the instance head Object logits coming from the semantic head (e.g., car) Stuff logits coming from the semantic head (e.g., sky) CV3DST | Prof. Leal-Taixé 72

The The Pano nopti tic c he head Objects need to be masked by the instance Mask logits from the instance head Object logits coming from the semantic head (e.g., car) Stuff logits coming from the semantic head (e.g., sky) This can be evaluated directly CV3DST | Prof. Leal-Taixé 73

The Pano The nopti tic c he head Perform softmax over the panoptic logits. If the maximum value falls into the first stuff channels, then it belongs to one of the stuff classes. Otherwise the index of the maximum value tells us the instance ID the pixel belongs to. Read the details on how to use the unknown class Xiong et al., “UPSNet: A Unified Panoptic Segmentation Network”. CVPR 2019 CV3DST | Prof. Leal-Taixé 74

Me Metr trics ics CV3DST | Prof. Leal-Taixé 75

Pa Panopt ptic c qu quali lity TP = True positive, FN = False negative, FP = false positive • SQ: Segmentation Quality = how close the predicted segments are to the ground truth segment (does not take into account bad predictions!) CV3DST | Prof. Leal-Taixé 76

Pa Panopt ptic c qu quali lity TP = True positive, FN = False negative, FP = false positive • RQ: Recognition Quality = just like for detection, we want to know if we are missing any instances (FN) or we are predicting more instances (FP). CV3DST | Prof. Leal-Taixé 77

Pa Panopt ptic c qu quali lity • As in detection, we have to “match ground truth and predictions. In this case we have segment matching. FP TP Predictions Ground truth IoU measures • Segment is matched if IoU>0.5. No pixel can belong to two predicted segments. CV3DST | Prof. Leal-Taixé 78

Pa Panopt ptic c segm gmentation: qu quali litative CV3DST | Prof. Leal-Taixé 79

Pa Panopt ptic c segm gmentation: qu quali litative CV3DST | Prof. Leal-Taixé 80

Ob Object ct Ins Instance nce Segm Segmen enta tati tion on as s Voti Voting CV3DST | Prof. Leal-Taixé 81

Sli Sliding ng Wind ndow w Ap Approach ch DPM, RCNN families • Densely enumerate box proposals + classify • Tremendously successful paradigm, very well • engineered SOTA methods are still based on this paradigm • CV3DST | Prof. Leal-Taixé 82

Ge Genera ralize zed H Hough ugh T Tra ransfo sform rm Before DPM, RCNN dominance: detection-as-voting CV3DST | Prof. Leal-Taixé 83

Ho Hough V Votin ing Detect analytical shapes (e.g., lines) as peaks in the • dual parametric space Each pixel casts a vote in this dual space • Detect peaks and 'back-project' them to the image • space CV3DST | Prof. Leal-Taixé 84

Ex Examp mple: e: Lin ine e Det etec ection ion Each edge point in image space casts a vote • CV3DST | Prof. Leal-Taixé 85

Ex Examp mple: e: Lin ine e Det etec ection ion Each edge point in image space casts a vote • • The vote is in the form of a line that crosses the point CV3DST | Prof. Leal-Taixé 86

Ex Examp mple: e: Lin ine e Det etec ection ion Accumulate votes from different points in • (discretized) parameter space Read-out maxima (peaks) from the accumulator • CV3DST | Prof. Leal-Taixé 87

Obj Object ct De Detect ction as Voting Idea: Objects are detected as consistent • configurations of the observed parts (visual words) CV3DST | Prof. Leal-Taixé 88 Leibe et al., Robust Object Detection with Interleaved Categorization and Segmentation, IJCV’08

Obj Object ct De Detect ction Training • Center point voting Interest point detection (SIFT, SURF) CV3DST | Prof. Leal-Taixé 89 Leibe et al., Robust Object Detection with Interleaved Categorization and Segmentation, IJCV’08

Obj Object ct De Detect ction Inference (test time) • CV3DST | Prof. Leal-Taixé 90

Bac Back to o the e future • Back to 2020… • We can use pixel consensus voting for panoptic segmentation (CVPR 20) CV3DST | Prof. Leal-Taixé 91

Ov Overv rview The instance voting branch predicts for every pixel whether the pixel is part of an instance mask, and if so, the relative location of the instance mask centroid. CV3DST | Prof. Leal-Taixé 92 H. Wang et al. “Pixel Consensus Voting for Panoptic Segmentation” CVPR 2020

In In a Nutshell 1. Discretize regions around each pixel. 2. Every pixel votes for a centroid (or no centroid for “stuff”) over a set of grid cells. CV3DST | Prof. Leal-Taixé 93

In In a Nutshell 3. Vote aggregation probabilities at each pixel are cast to accumulator space via (dilated) transposed convolutions 4. Detect objects as 'peaks' in the accumulator space CV3DST | Prof. Leal-Taixé 94

In In a Nutshell 5. Back-projection of 'peaks' back to the image to get an instance masks 6. Category information provided by the parallel semantic segmentation head CV3DST | Prof. Leal-Taixé 95

Vot Voting Look ookup Tab able Discretize region around the pixel: M × M cells • converted into K=17 indices. CV3DST | Prof. Leal-Taixé 96

Vot Voting Look ookup Tab able The vote should be cast to the center, which is the • red pixel, which corresponds to position 16. CV3DST | Prof. Leal-Taixé 97

Vot Voting • At inference, instance voting branch provides tensor of size [H,W,K+1] • Softly accumulate votes in the voting accumulator. Ho How? Example: for the blue pixel, we get a vote for index 16 with 0.9 probability (softmax output) Transfer 0.9 to cell 16 -- (dilated) ● tr transposed convoluti tion Evenly distribute among pixels, each gets ● 0.1 -- av averag age p pooling CV3DST | Prof. Leal-Taixé 98

Tr Trans nsposed Convo nvolu luti tions ns • Take a single value in the input • Multiply with a kernel and distribute in the output map Kernel defines the amount of the input value that is • being distributed to each of the output cells • For the purpose of vote aggregation, however, we fix the kernel parameters to 1-hot across each channel that marks the target location. CV3DST | Prof. Leal-Taixé 99

Vot Voting - Im Implementation • Output tensor: [H,W,K+1] • Example: 9 inner, 8 outer bins, K=17 • Split the output tensor to two tensors: [H,W,9],[H,W,8] Apply two transposed convolutions, with kernel of size • [3,3,9], stride=1 and [3,3,8], stride=3 Pre-fixed kernel parameters; 1-hot across each channel • that marks the target location Dilation => spread votes to the outer ring • • Smooth votes evenly via average pooling CV3DST | Prof. Leal-Taixé 100

Ins Instanc nce segm segmen enta tati tion on CV3DST | Prof. - PowerPoint PPT Presentation

Ins Instanc nce segm segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Se Semanti ntic c segmenta ntati tion Label every pixel, including the background (sky, grass, road) Do not differentiate between the pixels coming from

Sem Semanti tic c segm segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Ta Task d

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1

Borsa I taliana I taliana STAR segm ent STAR segm ent Borsa PRI MA I NDUSTRI E PRI MA

Debt bt Refina inancing ncing Invest estor Presen enta tati tion on Outline 1 Executi

Focu cusing sing on De Deli liver ery Invest estor Presen enta tati tion on Novem

Focu cusing sing on De Deli liver ery Invest estor Presen enta tati tion on Januar ary

De Deli livering ering Sustainable stainable Val alue ue Invest estor Presen enta tati

Sylv lvania ia I Int nter ercha hange S e Study dy City ty Council P Presen enta tati

ts of In Init itia iatio tion: The Th e Sa Sacr cramen aments BAPTIS PTISM EUCHA

Abs bstra racts cts a and P Pre resenta tati tions World C Conf nferenc nce o on O n

VACUUM E XCE LLE NCE DE FINE D VACUUM E XCE LLE NCE DE FINE D Cutting

Time me Segmen ment Presenter 8:30 AM 9:20 AM Check-In/Scan Registrations 9:20 AM

Segmen&ng a Market Segment: New Ideas for Capturing

PRES PRESEN ENTA TATIO TION F N FROM ROM NMH MHS( Za Zambia) Felix Imbwae Zambia

TONIGHTS AGENDA Open n house se 5:00 to 5:30 Presen enta tatio tion 5:30 to 6:00

Sentence and Contextualised Word Representations Graham Neubig Site

Compiler Assisted Masking A. Moss, E. Oswald, D. Page and M. Tunstall School of Computing,

Improving PixelCNN Vertical stack oblem with this m of masked convolution. Blind spot

Updated on reconstruction of ProtoDUNE DP data Vyacheslav Galymov IP2I Lyon Matching in 3D

for McEliece Im Implementations Thomas Eisenbarth Joint work with Cong Chen, Ingo von Maurich

openvswitch.ko minus Open vSwitch Joe Stringer, VMware

COSC 5351 Advanced Computer Architecture Slides modified from Hennessy CS252 course slides

Deep Learning for Natural Language processing Jindich Libovick March 1, 2017 Introduction

Ins Instanc nce segm segmen enta tati tion on CV3DST | Prof. - PowerPoint PPT Presentation

Ins Instanc nce segm segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Se Semanti ntic c segmenta ntati tion Label every pixel, including the background (sky, grass, road) Do not differentiate between the pixels coming from

Sem Semanti tic c segm segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Ta Task d

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1

Vi Video Ob eo Object ject Segm Segmen enta tati tion on CV3DST | Prof. Leal-Taix 1

Borsa I taliana I taliana STAR segm ent STAR segm ent Borsa PRI MA I NDUSTRI E PRI MA

Debt bt Refina inancing ncing Invest estor Presen enta tati tion on Outline 1 Executi

Focu cusing sing on De Deli liver ery Invest estor Presen enta tati tion on Novem

Focu cusing sing on De Deli liver ery Invest estor Presen enta tati tion on Januar ary

De Deli livering ering Sustainable stainable Val alue ue Invest estor Presen enta tati

Sylv lvania ia I Int nter ercha hange S e Study dy City ty Council P Presen enta tati

ts of In Init itia iatio tion: The Th e Sa Sacr cramen aments BAPTIS PTISM EUCHA

Abs bstra racts cts a and P Pre resenta tati tions World C Conf nferenc nce o on O n

VACUUM E XCE LLE NCE DE FINE D VACUUM E XCE LLE NCE DE FINE D Cutting

Time me Segmen ment Presenter 8:30 AM 9:20 AM Check-In/Scan Registrations 9:20 AM

Segmen&amp;ng a Market Segment: New Ideas for Capturing

PRES PRESEN ENTA TATIO TION F N FROM ROM NMH MHS( Za Zambia) Felix Imbwae Zambia

TONIGHTS AGENDA Open n house se 5:00 to 5:30 Presen enta tatio tion 5:30 to 6:00

Sentence and Contextualised Word Representations Graham Neubig Site

Compiler Assisted Masking A. Moss, E. Oswald, D. Page and M. Tunstall School of Computing,

Improving PixelCNN Vertical stack oblem with this m of masked convolution. Blind spot

Updated on reconstruction of ProtoDUNE DP data Vyacheslav Galymov IP2I Lyon Matching in 3D

for McEliece Im Implementations Thomas Eisenbarth Joint work with Cong Chen, Ingo von Maurich

openvswitch.ko minus Open vSwitch Joe Stringer, VMware

COSC 5351 Advanced Computer Architecture Slides modified from Hennessy CS252 course slides

Deep Learning for Natural Language processing Jindich Libovick March 1, 2017 Introduction

Segmen&ng a Market Segment: New Ideas for Capturing