Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning Tutorial,Part II 1
Overview Goals of the tutorial Somewhat organized overview of basics, and some more advanced topics Demistify jargon Pointers for informed further learning Aimed mostly at vision practitioners, but tools are widely applicable beyond vision Assumes basic familiarity with machine learning Deep Learning Tutorial,Part II 2
Overview Not covered Connections to brain Deep learning outside of neural networks Many recent advances Many specialized architectures for vision tasks Deep Learning Tutorial,Part II 3
Overview Outline Introduction (3 hours): Review of relevant machine learning concepts Feedforward neural networks and backpropagation Optimization techniques and issues Complexity and regularization in neural networks Intro to convolutional networks Advanced (3 hours): Advanced techniques for learning DNNs Very deep networks Convnets for tasks beyond image classification Recurrent networks Deep Learning Tutorial,Part II 4
Overview Sources Stanford CS231N: Convolutional Neural Networks for Visual Recognition Andrej Karpathy, Justin Johnson et al. (2016 edition) vision.stanford.edu/teaching/cs231n Deep Learning by Ian Goodfellow, Aaron Courville and Yoshua Bengio, 2016 Chris Olah: Understanding LSTM Networks (blog post) colah.github.io/posts/2015-08-Understanding-LSTMs Papers on arXiv and slides by the authors Deep Learning Tutorial,Part II 5
More training tricks Input normalization Standard practice: normalize the data Theoretically could apply a variety of normalization schemes: zero-mean unit variance “box normalization” whitening In practice, for images: subtract “mean pixel” (same for all locations) Assuming zero-mean filters: zero-mean filter response Matches zero padding Deep Learning Tutorial,Part II 6
More training tricks Batch normalization: motivation Problem: covariance shift (change in function’s domain) As learning proceeds, higher layer suffer from internal covariance shift due to changing parameters of previous layers Makes learning harder! Example: MNIST (15/50/85 percentile of input to typical sigmoid) [Ioffe and Szegedy] Deep Learning Tutorial,Part II 7
More training tricks Batch normalization: algorithm Batch normalization [Ioffe and Szegedy, 2015] Scale γ and shift β (per layer or psr channel) are learned through the usual backprop! Deep Learning Tutorial,Part II 8
More training tricks Batch normalization effect [Ioffe and Szegedy] Allows for higher learning rate, faster convergence May (or may not) reduce need for dropout De-factor standard today in most architectures Deep Learning Tutorial,Part II 9
More training tricks Data augmentation Part of the invariances learned by convnets is due to including variations in the training set Natural variation: different instances of objects, scene compositions etc. Can get a lot more for free by synthetic variations Obvious: mirror flip (horizontally – but not vertically!) [A. Karpathy] Deep Learning Tutorial,Part II 10
More training tricks Data augmentation Random crops (and scales) For image classification – assumes object is large and central E.g., training ResNet on ImageNet: resize image so shorter size is a random number between 256 and 480; crop random 224 × 224 window [A. Karpathy] Must match to testing regime ResNet: multiple scales, fixed crops for each scale, max Deep Learning Tutorial,Part II 11
More training tricks Data augmentation Random crops (and scales) For image classification – assumes object is large and central E.g., training ResNet on ImageNet: resize image so shorter size is a random number between 256 and 480; crop random 224 × 224 window [A. Karpathy] Must match to testing regime ResNet: multiple scales, fixed crops for each scale, max Deep Learning Tutorial,Part II 11
More training tricks Data augmentation Color jitter Apply in a structured way (e.g., using PCA on color) rather than per-pixel [A. Karpathy] Blur (see our paper on arXiv) Rotations? Noise? Deep Learning Tutorial,Part II 12
Very deep networks Quest for very deep networks Apparent dividends from depth (albeit diminishing): [He et al.] Three main challenges with depth: computational complexity (alleviated by hardware?), learning complexity; optimization complexity Deep Learning Tutorial,Part II 13
Very deep networks Training very deep networks Naive attempts to increase depth: CIFAR-10, simple sequence of 3 × 3 conv layers with occasional stride 2 (no pooling) [He et al.] At certain depth optimization fails Clearly an optimization issue (not learning)! Deep Learning Tutorial,Part II 14
Very deep networks GoogleNet A number of ad-hoc choices: “inception blocks”, auxiliary loss paths No fully connected layers! compared to AlexNet: 12 times fewer parameters, double FLOPs [Szegedy et al., 2014] Deep Learning Tutorial,Part II 15
Very deep networks Residual networks Conventional wisdom: Key idea: allow for “shortcuts” for loss to reach low layers “deep supervision” Residual connections [He et al., 2015]: learn what to add to the previous layer rather than how to modify it Deep Learning Tutorial,Part II 16
Very deep networks ResNet architecture Compare to VGG-19 and to plain architecture . . . Deep Learning Tutorial,Part II 17
Very deep networks ResNet with bottleneck blocks Bottleneck blocks: Can train hundreds of layers! state of the art on ImageNet/COCO is ResNets with 150-250 layers Similar to the Inception blocks in GoogleNet Deep Learning Tutorial,Part II 18
Very deep networks Stochastic depth From dropping units to dropping layers: regularization by stochastic depth at train time! Drop “ResNet blocks” with some probability [G. Hua et al., 2016] Made possible by the residual trick State of the art on recognition tasks Deep Learning Tutorial,Part II 19
Convnets for localization Transfer learning with convnets Advice from Andrej Karpathy: Deep Learning Tutorial,Part II 20
Convnets for localization Localization with convnets [A. Karpathy] Take classification net; discard top (fully connected) layers Attach new sub-net for bounding box regression; train At test time: use both classification and regression Deep Learning Tutorial,Part II 21
Convnets for localization Overfeat for detection Idea: reuse computation across overlapping sliding windows Key innovation: convert “fully connected” to convolutional layers [Sermanet et al.] Deep Learning Tutorial,Part II 22
Convnets for localization Fully convolutional networks Overfeat [Sermanet et al.] Deep Learning Tutorial,Part II 23
Convnets for localization R-CNN Deep Learning Tutorial,Part II 24
Convnets for localization R-CNN: results Deep Learning Tutorial,Part II 25
Convnets for localization Fast RCNN Deep Learning Tutorial,Part II 26
Convnets for localization Fast RCNN Deep Learning Tutorial,Part II 27
Convnets for localization Fast RCNN: ROI pooling Project region proposal to feature map Pool within each grid cell Deep Learning Tutorial,Part II 28
Convnets for localization Fast RCNN results Deep Learning Tutorial,Part II 29
Convnets for localization Faster RCNN Deep Learning Tutorial,Part II 30
Convnets for localization Region proposal network Learn to choose and refine coarse proposals Use a few “anchors” with diff. aspect ratios Deep Learning Tutorial,Part II 31
Convnets for localization Hypercolumns for image labeling Input image Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32
Convnets for localization Hypercolumns for image labeling VGG-16 (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 Input image Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32
Convnets for localization Hypercolumns for image labeling Hypercolumn VGG-16 (fc7) conv7 (fc6) conv6 conv5 3 conv1 1 Input image Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32
Convnets for localization Hypercolumns for image labeling Hypercolumn VGG-16 (fc7) conv7 (fc6) conv6 conv5 3 h fc1 h fc2 conv1 1 cls Input image Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32
Convnets for localization Hypercolumns for image labeling Hypercolumn VGG-16 (fc7) conv7 (fc6) conv6 conv5 3 h fc1 h fc2 conv1 1 cls Input image Output map Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32
Convnets for localization Hypercolumns for image labeling Output map (low res) Hypercolumn VGG-16 (fc7) conv7 (fc6) conv6 conv5 3 ↑ 2 h fc1 h fc2 conv1 1 cls Input image Hypercolumns: skip-layer connections from lower layers directly to classification network [Mostajabi et al., 2015] Deep Learning Tutorial,Part II 32
Recommend
More recommend