cnns applications
play

CNNs Applications M. Soleymani Sharif University of Technology - PowerPoint PPT Presentation

CNNs Applications M. Soleymani Sharif University of Technology Spring 2019 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017. AlexNet [Krizhevsky, Sutskever, Hinton, 2012] ImageNet Classification


  1. CNNs Applications M. Soleymani Sharif University of Technology Spring 2019 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017.

  2. AlexNet [Krizhevsky, Sutskever, Hinton, 2012] • ImageNet Classification with Deep Convolutional Neural Networks

  3. Image classification

  4. Other Computer Vision Tasks

  5. Semantic Segmentation

  6. Semantic Segmentation Idea: Sliding Window Problem: Very inefficient! Not reusing shared features between overlapping patches

  7. Semantic Segmentation Idea: Fully Convolutional

  8. Semantic Segmentation Idea: Fully Convolutional

  9. Semantic Segmentation Idea: Fully Convolutional

  10. In-Network upsampling: “Unpooling”

  11. In-Network upsampling: “Max Unpooling”

  12. Learnable Upsampling: Transpose Convolution

  13. Learnable Upsampling: Transpose Convolution

  14. Learnable Upsampling: Transpose Convolution Other names: Deconvolution (bad) Upconvolution Fractionally strided convolution Backward strided convolution

  15. Transpose Convolution: 1D Example

  16. Semantic Segmentation Idea: Fully Convolutional

  17. Computer Vision Tasks

  18. Classification + Localization Classification : C classes Input: Image CAT Output: Class label Evaluation metric: Accuracy Localization : Input: Image Output : Box in the image (x, y, w, h) (x, y, w, h) Evaluation metric: Intersection over Union Classification + Localization : Do both

  19. Classification + Localization Often pretrained on ImageNet (Transfer learning)

  20. Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) Convolution Fully-connected and Pooling layers Softmax loss Final conv Class feature map Image scores

  21. Localization as Regression Input : image Neural Output : Net Box coordinates (4 numbers) Loss : L2 distance Correct output : box coordinates (4 numbers) Only one object, simpler than detection

  22. Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “regression head” to the network Fully-connected layers “Classification head” Convolution Class scores and Pooling Fully-connected layers “Regression head” Final conv Box feature map Image coordinate s

  23. Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “regression head” to the network • Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers “Classification head” Convolution and Pooling Class scores Fully-connected layers L2 loss Final conv Box feature map Image coordinates

  24. Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “regression head” to the network • Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers “Classification head” Convolution and Pooling Class scores Fully-connected layers L2 loss Final conv Box feature map Image coordinates

  25. Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “regression head” to the network • Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers Softmax loss Convolution and Pooling Class scores Fully-connected layers L2 loss Final conv Box feature map Image coordinates

  26. Classification + Localization Often pretrained on ImageNet (Transfer learning)

  27. Classification + Localization Fully-connected Classification head : layers C numbers (one per class) Softmax loss Convolution and Pooling Class scores Class agnostic: Fully-connected 4 numbers layers (one box) L2 loss Class specific: Final conv C x 4 numbers Box feature map Image (one box per class) coordinates

  28. Aside: Human Pose Estimation

  29. Aside: Human Pose Estimation

  30. Where to attach the regression head? After last FC layer : After conv layers : DeepPose, R-CNN Overfeat, VGG Fully- Convolution connected and Pooling layers Softmax loss Final conv Class feature map Image scores

  31. Object detection

  32. Object Detection: Impact of Deep Learning

  33. Object Detection as Regression? Each image needs a different number of outputs!

  34. Object Detection as Classification: Sliding Window

  35. Object Detection as Classification: Sliding Window

  36. Object Detection as Classification: Sliding Window Problem: Need to apply CNN to huge number of locations and scales, very computationally expensive!

  37. Region Proposals

  38. R-CNN

  39. R-CNN

  40. R-CNN

  41. R-CNN

  42. R-CNN Training • Step 1 : Train (or download) a classification model for ImageNet (e.g. VGG) Convolution Fully-connected and Pooling layers Softmax loss Final conv Class scores feature map Image 1000 classes

  43. R-CNN Training • Step 2 : Fine-tune model for detection Instead of 1000 ImageNet classes, want 20 object classes + background - Throw away final fully-connected layer, reinitialize from scratch - Keep training model using positive / negative regions from detection images - Re-initialize this layer: Convolution Fully-connected was 4096 x 1000, and Pooling layers now will be 4096 x 21 Softmax loss Final conv Class scores: feature map Image 21 classes

  44. R-CNN Training Step 3 : Extract features - Extract region proposals for all images - For each region: warp to CNN input size, run forward through CNN, save pool5 features to disk - Have a big hard drive: features are ~200GB for PASCAL dataset! Convolution and Pooling pool5 features Image Region Crop + Warp Forward pass Save to disk Proposals

  45. R-CNN Training • Step 4 : Train one binary SVM per class to classify region features Training image regions Cached region features Positive samples for cat Negative samples for cat SVM SVM

  46. R-CNN Training • Step 5 (bbox regression): For each class, train a linear regression model to map from cached features to offsets to GT boxes to make up for “slightly wrong” proposals Training image regions Cached region features (.25, 0, 0, 0) (0, 0, -0.125, 0) Regression targets (0, 0, 0, 0) Proposal too Proposal too (dx, dy, dw, dh) Proposal is far to left wide Normalized good coordinates

  47. R-CNN: Problems • Ad hoc training objectives – Fine-tune network with softmax classifier (log loss) – Train post-hoc linear SVMs (hinge loss) – Train post-hoc bounding-box regressions (least squares) • Training is slow (84h), takes a lot of disk space • Inference (detection) is slow – 47s / image with VGG16 [Simonyan & Zisserman. ICLR15] – Fixed by SPP-net [He et al. ECCV14]

  48. Fast R-CNN

  49. Fast R-CNN Share computation of convolutional layers between proposals for an image

  50. Fast R-CNN: Region of Interest Pooling Convolution Fully-connected and Pooling layers Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

  51. Fast R-CNN: Region of Interest Pooling Project region proposal onto conv Convolution Fully-connected feature map and Pooling layers Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

  52. Fast R-CNN: Region of Interest Pooling Divide projected Convolution Fully-connected region into h x w and Pooling layers grid Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

  53. Fast R-CNN: Region of Interest Pooling Max-pool within each Convolution Fully-connected grid cell and Pooling layers Hi-res input RoI conv features: Hi-res conv Fully-connected layers image: C x h x w features: expect low-res conv 3 x 800 x 600 for region C x H x W features: with region proposal with region C x h x w proposal proposal

  54. Fast R-CNN: Region of Interest Pooling Can back propagate similar Convolution Fully-connected to max pooling and Pooling layers Hi-res input RoI conv features: Hi-res conv Fully-connected layers image: C x h x w features: expect low-res conv 3 x 800 x 600 for region C x H x W features: with region proposal with region C x h x w proposal proposal

  55. Fast R-CNN: RoI Pooling

  56. Fast R-CNN Share computation of convolutional layers between proposals for an image

  57. R-CNN vs SPP vs Fast R-CNN Problem: Runtime dominated by region proposals!

  58. Faster R-CNN • Make CNN do proposals! – Solely based on CNN – No external modules • Each step is end-to-end Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. NIPS 2015

Recommend


More recommend