Deep Learning beyond Classification Cees Snoek, UvA Efstratios - PowerPoint PPT Presentation

Deep Learning beyond Classification Cees Snoek, UvA Efstratios Gavves, UvA Laurens van de Maaten, Facebook

Standard inference N-way classification Dog? Cat? Bike Car? Plane ? ?

Standard inference N-way classification How popular will this movie be in IMDB? Regression

Standard inference N-way classification Who is older? Regression Ranking …

Quiz: What is common? N-way classification Regression Ranking …

Quiz: What is common? They all make “single value” predictions Do all our machine learning tasks boil down to “single value” predictions?

Beyond “single value” predictions? Do all our machine learning tasks boil to “single value” predictions? Are there tasks where outputs are somehow correlated? Is there some structure in this output correlations? How can we predict such structures? q Structured prediction

Quiz: Examples?

Object detection Predict a box around an object Images q Spatial location q b(ounding) box Videos q Spatio-temporal location q bbox@t, bbox@t+1, …

Object segmentation

Optical flow & motion estimation

Depth estimation Godard et al., Unsupervised Monocular Depth Estimation with Left-Right Consistency, 2016

Normals and reflectance estimation

Structured prediction Prediction goes beyond asking for “single values” Outputs are complex and output dimensions correlated Output dimensions have latent structure Can we make deep networks to return structured predictions?

Convnets for structured prediction

Sliding window on feature maps Selective Search Object Proposals [Uijlings2013] SPPnet [He2014] Fast R-CNN [Girshick2015]

Fast R-CNN: Steps Process the whole image up to conv5 Conv 1 Conv 3 Conv 2 Conv 4 Conv 5 Conv 5 feature map

Fast R-CNN: Steps Process the whole image up to conv5 Compute possible locations for objects Conv 1 Conv 3 Conv 2 Conv 4 Conv 5 Conv 5 feature map

Fast R-CNN: Steps Process the whole image up to conv5 Compute possible locations for objects q some correct, most wrong Conv 1 Conv 3 Conv 2 Conv 4 Conv 5 Conv 5 feature map

Fast R-CNN: Steps Process the whole image up to conv5 Compute possible locations for objects q some correct, most wrong Given single location à ROI pooling module extracts fixed length feature Conv 1 Conv 3 Conv 2 Conv 4 Conv 5 Always 4x4 no matter the size of candidate Conv 5 feature map location

Fast R-CNN: Steps Process the whole image up to conv5 Compute possible locations for objects q some correct, most wrong Given single location à ROI pooling module extracts fixed length feature ROI Pooling Module Conv 1 Conv 3 Conv 2 Conv 4 Conv 5 Always 4x4 no matter the size of candidate Conv 5 feature map location

Fast R-CNN: Steps Process the whole image up to conv5 Compute possible locations for objects New box q some correct, most wrong Car/dog/bicycle coordinates Given single location à ROI pooling module extracts fixed length feature ROI Pooling Module Conv 1 Conv 3 Conv 2 Conv 4 Conv 5 Always 4x4 no matter the size of candidate Conv 5 feature map location

Divide feature map in !"! cells q Cell size changes depending on the size of the candidate location Always 3x3 no matter the size of candidate location

Some results

Fast R-CNN Reuse convolutions for different candidate boxes q Compute feature maps only once Region-of-Interest pooling q Define stride relatively à box width divided by predefined number of “poolings” T q Fixed length vector End-to-end training! (Very) Accurate object detection (Very) Faster T=5 q Less than a second per image External box proposals needed

Faster R-CNN [Girshick2016] Fast R-CNN q external candidate locations Faster R-CNN q deep network proposes candidate locations Slide the feature map q ! anchor boxes per slide Region Proposal Network

Going Fully Convolutional [LongCVPR2014] Image larger than network input q slide the network Is this pixel a camel? Yes! No! 5 Conv 4 Conv 1 Conv 2 Conv 3 Conv fc1 fc2

Fully Convolutional Networks [LongCVPR2014] Connect intermediate layers to output

Fully Convolutional Networks Output is too coarse q Image Size 500x500, Alexnet Input Size: 227x227 à Output: 10x10 How to obtain dense predictions? Upconvolution q Other names: deconvolution, transposed convolution, fractionally-strided convolutions

Deconvolutional modules Output Image Upconvolution Upconvolution Convolution No padding, no strides Padding, strides No padding, no strides https://github.com/vdumoulin/conv_arithmetic

Coarse à Fine Output Large loss generated (probability much higher than ground truth) Small loss generated 1 0 0 Ground truth pixel labels Pixel label 0.8 0.1 0.9 probabilities Upconvolution Upconvolution 2x 2x 7x7 14x14 224x224

Structured losses

Deep ConvNets with CRF loss [Chen, Papandreou 2016] Segmentation map is good but not pixel-precise q Details around boundaries are lost Cast fully convolutional outputs as unary potentials Consider pairwise potentials between output dimensions

Deep ConvNets with CRF loss [Chen, Papandreou 2016]

Deep ConvNets with CRF loss [Chen, Papandreou 2016] Segmentation map is good but not pixel-precise – Details around boundaries are lost Cast fully convolutional outputs as unary potentials Consider pairwise potentials between output dimensions Include Fully Connected CRF loss to refine segmentation ! " = ∑% & " & + ∑% &( (" & , " ( ) Total loss Unary loss Pairwise loss 5 − 6 7 & − I ( 5 + - 5 exp(−9 4 & − 4 ( 5 ) % &( " & , " ( ~ - . exp −3 4 & − 4 (

Examples

Mask R-CNN State-of-the-art in semantic segmentation Heavily relies on Fast R-CNN Can work with different architectures, also ResNet Runs at 195ms per image on an Nvidia Tesla M40 GPU Can also be used for Human Pose Estimation

Mask R-CNN: R-CNN + 2 layers

Mask R-CNN: ROI Align

Mask R-CNN

SINT: Siamese Networks for Tracking While tracking, the only definitely correct training example is the first frame q All others are inferred by the algorithm If the “inferred positives” are correct, then the model is already good enough and no update is needed If the “inferred positives” are incorrect, updating the model using wrong positive examples will eventually destroy the model Siamese Instance Search for Tracking, R. Tao, E. Gavves, A. Smeulders, CVPR 2016

Basic Idea No model updates through time to avoid model contamination Instead, learn invariance model ! ( "# ) – invariances shared between objects – reliable, external, rich, category-independent, data Assumption – The appearance variances are shared amongst object and categories – Learning can accurate enough to identify common appearance variances Solution: Use a Siamese Network to compare patches between images – Then “tracking” equals finding the most similar patch at each frame (no temporal modelling)

Training loss $(! " ) $(! # ) Marginal Contrastive Loss: CNN CNN ' ! " , ! # , ) "# = 1 2 ) "# - . + 1 2 1 − ) "# max(0, 5 − - . ) f(.) f(.) ) "# ∈ {0,1} - = $ ! " − $(! # ) . ! " ! # Matching function (after learning): 9 ! " , ! # = $ ! " : $ ! #

Deep Learning beyond Classification Cees Snoek, UvA Efstratios - PowerPoint PPT Presentation

Deep Learning beyond Classification Cees Snoek, UvA Efstratios Gavves, UvA Laurens van de Maaten, Facebook Standard inference N-way classification Dog? Cat? Bike Car? Plane ? ? Standard inference N-way classification How popular will

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Multiclass Classification CS 6956: Deep Learning for NLP 1 So far: Binary Classification We

Image Classification with DIGITS NVIDIA Deep Learning Institute 1 DEEP LEARNING INSTITUTE DLI

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Deep Learning for Classification CS293S, Yang, 2017 Computational graph for classification w 1 f

Large-Scale Self-Supervised Robotic Learning Chelsea Finn In collaboration with Sergey Levine

Localization and Mapping Chapter 25.3 Chapter 25.3 1 Sensors Range finders: sonar (land,

Computer Vision for Mobile Robots in GPS Denied Areas Michael Berli, 28th of April 2015

1 Spatiotemporal Query Service Design Goals of MobiQuery Allows a mobile user to periodically

In the name of Allah the compassionate, the merciful Digital Video Systems S. Kasaei S. Kasaei

Crowd Scene Understanding with Coherent Recurrent Neural Networks Hang Su, Yinpeng Dong, Jun Zhu

Computational Seismology: An Introduction Li Zhao Institute of Earth Sciences Academia Sinica,

Video Compression Lecture # 5 6 Shahab Baqai LUMS Outline Image compression