cnns for
play

CNNs for Segmentation, Localization, and detection M. Soleymani - PowerPoint PPT Presentation

CNNs for Segmentation, Localization, and detection M. Soleymani Sharif University of Technology Fall 2017 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017 and some from John Canny lectures,


  1. CNNs for Segmentation, Localization, and detection M. Soleymani Sharif University of Technology Fall 2017 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017 and some from John Canny lectures, cs294-129, Berkeley, 2016. .

  2. AlexNet [Krizhevsky, Sutskever, Hinton, 2012] • ImageNet Classification with Deep Convolutional Neural Networks

  3. Image classification

  4. Other Computer Vision Tasks

  5. Classification and localization What & where

  6. Classification + Localization Classification : C classes Input: Image CAT Output: Class label Evaluation metric: Accuracy Localization : Input: Image Output : Box in the image (x, y, w, h) (x, y, w, h) Evaluation metric: Intersection over Union Classification + Localization : Do both

  7. Idea #1: Localization as Regression Input : image Neural Output : Net Box coordinates (4 numbers) Loss : L2 distance Correct output : box coordinates (4 numbers) Only one object, simpler than detection

  8. Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) Convolution Fully-connected and Pooling layers Softmax loss Final conv Class feature map Image scores

  9. Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “ regression head ” to the network Fully-connected layers “ Classification head ” Convolution Class scores and Pooling Fully-connected layers “ Regression head ” Final conv Box feature map Image coordinates

  10. Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “ regression head ” to the network • Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers “ Classification head ” Convolution and Pooling Class scores Fully-connected layers L2 loss Final conv Box feature map Image coordinates

  11. Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “ regression head ” to the network • Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers “ Classification head ” Convolution and Pooling Class scores Fully-connected layers L2 loss Final conv Box feature map Image coordinates

  12. Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “ regression head ” to the network • Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers Softmax loss Convolution and Pooling Class scores Fully-connected layers L2 loss Final conv Box feature map Image coordinates

  13. Classification + Localization Often pretrained on ImageNet (Transfer learning)

  14. Aside: Human Pose Estimation

  15. Aside: Human Pose Estimation

  16. Where to attach the regression head? After last FC layer : After conv layers : Overfeat, VGG DeepPose, R-CNN Fully- Convolution connected and Pooling layers Softmax loss Final conv Class feature map Image scores

  17. Object detection

  18. Object Detection: Impact of Deep Learning

  19. Object Detection as Regression? Each image needs a different number of outputs!

  20. Object Detection as Classification: Sliding Window

  21. Object Detection as Classification: Sliding Window

  22. Object Detection as Classification: Sliding Window Problem: Need to apply CNN to huge number of locations and scales, very computationally expensive!

  23. Region Proposals

  24. R-CNN

  25. R-CNN

  26. R-CNN

  27. R-CNN

  28. R-CNN Training • Step 1 : Train (or download) a classification model for ImageNet (AlexNet) Convolution Fully-connected and Pooling layers Softmax loss Final conv Class scores feature map Image 1000 classes

  29. R-CNN Training • Step 2 : Fine-tune model for detection Instead of 1000 ImageNet classes, want 20 object classes + background - Throw away final fully-connected layer, reinitialize from scratch - Keep training model using positive / negative regions from detection images - Re-initialize this layer: Convolution Fully-connected was 4096 x 1000, and Pooling layers now will be 4096 x 21 Softmax loss Final conv Class scores: feature map Image 21 classes

  30. R-CNN Training Step 3 : Extract features - Extract region proposals for all images - For each region: warp to CNN input size, run forward through CNN, save pool5 features to disk - Have a big hard drive: features are ~200GB for PASCAL dataset! Convolution and Pooling pool5 features Image Region Crop + Warp Forward pass Save to disk Proposals

  31. R-CNN Training • Step 4 : Train one binary SVM per class to classify region features Training image regions Cached region features Positive samples for cat Negative samples for cat SVM SVM

  32. R-CNN Training • Step 5 (bbox regression): For each class, train a linear regression model to map from cached features to offsets to GT boxes to make up for “ slightly wrong ” proposals Training image regions Cached region features (.25, 0, 0, 0) (0, 0, -0.125, 0) Regression targets (0, 0, 0, 0) Proposal too Proposal too (dx, dy, dw, dh) Proposal is far to left wide Normalized good coordinates

  33. R-CNN: Problems • Ad hoc training objectives – Fine-tune network with softmax classifier (log loss) – Train post-hoc linear SVMs (hinge loss) – Train post-hoc bounding-box regressions (least squares) • Training is slow (84h), takes a lot of disk space • Inference (detection) is slow – 47s / image with VGG16 [Simonyan & Zisserman. ICLR15] – Fixed by SPP-net [He et al. ECCV14]

  34. Fast R-CNN

  35. Fast R-CNN Share computation of convolutional layers between proposals for an image

  36. Fast R-CNN: Region of Interest Pooling Convolution Fully-connected and Pooling layers Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

  37. Fast R-CNN: Region of Interest Pooling Project region proposal onto conv Convolution Fully-connected feature map and Pooling layers Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

  38. Fast R-CNN: Region of Interest Pooling Divide projected Convolution Fully-connected region into h x w and Pooling layers grid Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

  39. Fast R-CNN: Region of Interest Pooling Max-pool within each Convolution Fully-connected grid cell and Pooling layers Hi-res input RoI conv features: Hi-res conv Fully-connected layers image: C x h x w features: expect low-res conv 3 x 800 x 600 for region C x H x W features: with region proposal with region C x h x w proposal proposal

  40. Fast R-CNN: Region of Interest Pooling Can back propagate similar Convolution Fully-connected to max pooling and Pooling layers Hi-res input RoI conv features: Hi-res conv Fully-connected layers image: C x h x w features: expect low-res conv 3 x 800 x 600 for region C x H x W features: with region proposal with region C x h x w proposal proposal

  41. Fast R-CNN: RoI Pooling

  42. Fast R-CNN Share computation of convolutional layers between proposals for an image

  43. R-CNN vs SPP vs Fast R-CNN Problem: Runtime dominated by region proposals!

  44. Faster R-CNN • Make CNN do proposals! – Solely based on CNN – No external modules • Each step is end-to-end Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. “ Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ” . NIPS 2015

  45. Faster R-CNN • Insert Region Proposal Network (RPN) to predict proposals from features Jointly • Train with 4 losses: – RPN classify object / not object – RPN regress box coordinates – Final classification score (object classes) – Final box coordinates

  46. Faster R-CNN: Make CNN do proposals!

  47. Object Detection Source: http://icml.cc/2016/tutorials/icml2016_tutorial_deep_residual_networks_kaiminghe.pdf

  48. Semantic Segmentation

  49. Semantic Segmentation Idea: Sliding Window Problem: Very inefficient! Not reusing shared features between overlapping patches

  50. Semantic Segmentation Idea: Fully Convolutional

  51. Semantic Segmentation Idea: Fully Convolutional

  52. Semantic Segmentation Idea: Fully Convolutional

  53. In-Network upsampling: “ Unpooling ”

  54. In-Network upsampling: “ Max Unpooling ”

  55. Learnable Upsampling: Transpose Convolution

  56. Learnable Upsampling: Transpose Convolution

  57. Learnable Upsampling: Transpose Convolution Other names: Deconvolution (bad) Upconvolution Fractionally strided convolution Backward strided convolution

Recommend


More recommend