deep learning tutorial part ii greg shakhnarovich tti
play

Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago - PowerPoint PPT Presentation

Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning Tutorial,Part II 1 Overview Goals of the tutorial Somewhat organized overview of basics, and some more advanced topics Demistify jargon Pointers


  1. Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning Tutorial,Part II 1

  2. Overview Goals of the tutorial Somewhat organized overview of basics, and some more advanced topics Demistify jargon Pointers for informed further learning Aimed mostly at vision practitioners, but tools are widely applicable beyond vision Assumes basic familiarity with machine learning Deep Learning Tutorial,Part II 2

  3. Overview Not covered Connections to brain Deep learning outside of neural networks Many recent advances Many specialized architectures for vision tasks Deep Learning Tutorial,Part II 3

  4. Overview Outline Introduction (3 hours): Review of relevant machine learning concepts Feedforward neural networks and backpropagation Optimization techniques and issues Complexity and regularization in neural networks Intro to convolutional networks Advanced (3 hours): Advanced techniques for learning DNNs Very deep networks Convnets for tasks beyond image classification Recurrent networks Deep Learning Tutorial,Part II 4

  5. Overview Sources Stanford CS231N: Convolutional Neural Networks for Visual Recognition Andrej Karpathy, Justin Johnson et al. (2016 edition) vision.stanford.edu/teaching/cs231n Deep Learning by Ian Goodfellow, Aaron Courville and Yoshua Bengio, 2016 Chris Olah: Understanding LSTM Networks (blog post) colah.github.io/posts/2015-08-Understanding-LSTMs Papers on arXiv and slides by the authors Deep Learning Tutorial,Part II 5

  6. More training tricks Input normalization Standard practice: normalize the data Theoretically could apply a variety of normalization schemes: zero-mean unit variance “box normalization” whitening In practice, for images: subtract “mean pixel” (same for all locations) Assuming zero-mean filters: zero-mean filter response Matches zero padding Deep Learning Tutorial,Part II 6

  7. More training tricks Batch normalization: motivation Problem: covariance shift (change in function’s domain) As learning proceeds, higher layer suffer from internal covariance shift due to changing parameters of previous layers Makes learning harder! Example: MNIST (15/50/85 percentile of input to typical sigmoid) [Ioffe and Szegedy] Deep Learning Tutorial,Part II 7

  8. More training tricks Batch normalization: algorithm Batch normalization [Ioffe and Szegedy, 2015] Scale γ and shift β (per layer or psr channel) are learned through the usual backprop! Deep Learning Tutorial,Part II 8

  9. More training tricks Batch normalization effect [Ioffe and Szegedy] Allows for higher learning rate, faster convergence May (or may not) reduce need for dropout De-factor standard today in most architectures Deep Learning Tutorial,Part II 9

  10. More training tricks Data augmentation Part of the invariances learned by convnets is due to including variations in the training set Natural variation: different instances of objects, scene compositions etc. Can get a lot more for free by synthetic variations Obvious: mirror flip (horizontally – but not vertically!) [A. Karpathy] Deep Learning Tutorial,Part II 10

  11. More training tricks Data augmentation Random crops (and scales) For image classification – assumes object is large and central E.g., training ResNet on ImageNet: resize image so shorter size is a random number between 256 and 480; crop random 224 × 224 window [A. Karpathy] Must match to testing regime ResNet: multiple scales, fixed crops for each scale, max Deep Learning Tutorial,Part II 11

  12. More training tricks Data augmentation Random crops (and scales) For image classification – assumes object is large and central E.g., training ResNet on ImageNet: resize image so shorter size is a random number between 256 and 480; crop random 224 × 224 window [A. Karpathy] Must match to testing regime ResNet: multiple scales, fixed crops for each scale, max Deep Learning Tutorial,Part II 11

  13. More training tricks Data augmentation Color jitter Apply in a structured way (e.g., using PCA on color) rather than per-pixel [A. Karpathy] Blur (see our paper on arXiv) Rotations? Noise? Deep Learning Tutorial,Part II 12

  14. Very deep networks Quest for very deep networks Apparent dividends from depth (albeit diminishing): [He et al.] Three main challenges with depth: computational complexity (alleviated by hardware?), learning complexity; optimization complexity Deep Learning Tutorial,Part II 13

  15. Very deep networks Training very deep networks Naive attempts to increase depth: CIFAR-10, simple sequence of 3 × 3 conv layers with occasional stride 2 (no pooling) [He et al.] At certain depth optimization fails Clearly an optimization issue (not learning)! Deep Learning Tutorial,Part II 14

  16. Very deep networks GoogleNet A number of ad-hoc choices: “inception blocks”, auxiliary loss paths No fully connected layers! compared to AlexNet: 12 times fewer parameters, double FLOPs [Szegedy et al., 2014] Deep Learning Tutorial,Part II 15

  17. Very deep networks Residual networks Conventional wisdom: Key idea: allow for “shortcuts” for loss to reach low layers “deep supervision” Residual connections [He et al., 2015]: learn what to add to the previous layer rather than how to modify it Deep Learning Tutorial,Part II 16

  18. Very deep networks ResNet architecture Compare to VGG-19 and to plain architecture . . . Deep Learning Tutorial,Part II 17

  19. Very deep networks ResNet with bottleneck blocks Bottleneck blocks: Can train hundreds of layers! state of the art on ImageNet/COCO is ResNets with 150-250 layers Similar to the Inception blocks in GoogleNet Deep Learning Tutorial,Part II 18

  20. Very deep networks Stochastic depth From dropping units to dropping layers: regularization by stochastic depth at train time! Drop “ResNet blocks” with some probability [G. Hua et al., 2016] Made possible by the residual trick State of the art on recognition tasks Deep Learning Tutorial,Part II 19

  21. Convnets for localization Transfer learning with convnets Advice from Andrej Karpathy: Deep Learning Tutorial,Part II 20

  22. Convnets for localization Localization with convnets [A. Karpathy] Take classification net; discard top (fully connected) layers Attach new sub-net for bounding box regression; train At test time: use both classification and regression Deep Learning Tutorial,Part II 21

  23. Convnets for localization Overfeat for detection Idea: reuse computation across overlapping sliding windows Key innovation: convert “fully connected” to convolutional layers [Sermanet et al.] Deep Learning Tutorial,Part II 22

  24. Convnets for localization Fully convolutional networks Overfeat [Sermanet et al.] Deep Learning Tutorial,Part II 23

  25. Convnets for localization R-CNN Deep Learning Tutorial,Part II 24

  26. Convnets for localization R-CNN: results Deep Learning Tutorial,Part II 25

  27. Convnets for localization Fast RCNN Deep Learning Tutorial,Part II 26

  28. Convnets for localization Fast RCNN Deep Learning Tutorial,Part II 27

  29. Convnets for localization Fast RCNN: ROI pooling Project region proposal to feature map Pool within each grid cell Deep Learning Tutorial,Part II 28

  30. Convnets for localization Fast RCNN results Deep Learning Tutorial,Part II 29

  31. Convnets for localization Faster RCNN Deep Learning Tutorial,Part II 30

  32. Convnets for localization Region proposal network Learn to choose and refine coarse proposals Use a few “anchors” with diff. aspect ratios Deep Learning Tutorial,Part II 31

  33. Convnets for localization Hypercolumns for image labeling Input image Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32

  34. Convnets for localization Hypercolumns for image labeling VGG-16 (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 Input image Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32

  35. Convnets for localization Hypercolumns for image labeling Hypercolumn VGG-16 (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 Input image Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32

  36. Convnets for localization Hypercolumns for image labeling Hypercolumn VGG-16 (fc7) conv7 (fc6) conv6 conv5 3 h fc1 h fc2 conv1 1 cls Input image Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32

  37. Convnets for localization Hypercolumns for image labeling Hypercolumn VGG-16 (fc7) conv7 (fc6) conv6 conv5 3 h fc1 h fc2 conv1 1 cls Input image Output map Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32

  38. Convnets for localization Hypercolumns for image labeling Output map (low res) Hypercolumn VGG-16 (fc7) conv7 (fc6) conv6 conv5 3 ↑ 2 h fc1 h fc2 conv1 1 cls Input image Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32

Recommend


More recommend