convolutional neural networks cnns and recurrent neural
play

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks - PowerPoint PPT Presentation

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) CMSC 678 UMBC Recap from last time Feed-Forward Neural Network: Multilayer Perceptron + 0 ) = (


  1. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) CMSC 678 UMBC

  2. Recap from last time…

  3. Feed-Forward Neural Network: Multilayer Perceptron 𝑧 𝑦 β„Ž π‘ˆ 𝑦 + 𝑐 0 ) 𝐱 𝟐 β„Ž 𝑗 = 𝐺(𝐱 𝐣 π‘ˆ β„Ž + 𝑐 1 ) 𝐱 πŸ‘ 𝐱 πŸ’ 𝐱 πŸ“ 𝑧 π‘˜ = G(𝛄 𝐀 F : (non-linear) activation G: (non-linear) activation function function Classification: softmax 𝜸 Regression: identity 𝑧 1 𝑧 2 information/ no self-loops computation flow (recurrence/reuse of weights)

  4. Flavors of Gradient Descent β€œOnline” β€œ Minibatch ” β€œBatch” Set t = 0 Set t = 0 Set t = 0 Pick a starting value ΞΈ t Pick a starting value ΞΈ t Pick a starting value ΞΈ t Until converged: Until converged: Until converged: get batch B βŠ‚ full data set g t = 0 set g t = 0 for example i in full data: for example(s) i in B: for example(s) i in full data: 1. Compute loss l on x i 1. Compute loss l on x i 1. Compute loss l on x i 2. Get gradient 2. Accumulate gradient 2. Accumulate gradient g t = l’(x i ) g t += l’(x i ) g t += l’( x i ) 3. Get scaling factor ρ t done done 4. Set ΞΈ t+1 = ΞΈ t - ρ t *g t Get scaling factor ρ t Get scaling factor ρ t 5. Set t += 1 Set ΞΈ t+1 = ΞΈ t - ρ t *g t Set ΞΈ t+1 = ΞΈ t - ρ t *g t done Set t += 1 Set t += 1

  5. Dropout: Regularization in Neural Networks 𝑧 𝑦 β„Ž 𝐱 𝟐 𝐱 πŸ‘ 𝐱 πŸ’ 𝐱 πŸ“ 𝜸 𝑧 1 𝑧 2 randomly ignore β€œneurons” (h i ) during training

  6. tanh Activation s=10 2 tanh s 𝑦 = 1 + exp(βˆ’2 βˆ— 𝑑 βˆ— 𝑦) βˆ’ 1 s=1 = 2𝜏 𝑑 𝑦 βˆ’ 1 s=0.5

  7. Rectifiers Activations relu 𝑦 = max(0, 𝑦) softplus 𝑦 = log(1 + exp 𝑦 ) leaky_relu 𝑦 = α‰Š0.01𝑦, 𝑦 < 0 𝑦, 𝑦 β‰₯ 0

  8. Outline Convolutional Neural Recurrent Neural Networks Networks What is a convolution? Types of recurrence Multidimensional A basic recurrent cell Convolutions BPTT: Backpropagation Typical Convnet Operations through time Deep convnets

  9. Dot Product βˆ‘ 𝑦 π‘ˆ 𝑧 = ෍ 𝑦 𝑙 𝑧 𝑙 𝑙

  10. Convolution: Modified Dot Product Around a Point 𝑦 π‘ˆ 𝑧 𝑗 = ෍ 𝑦 𝑙+𝑗 𝑧 𝑙 𝑙<𝐿 βˆ‘ Convolution/cross-correlation

  11. Convolution: Modified Dot Product Around a Point 𝑦 π‘ˆ 𝑧 𝑗 = ෍ 𝑦 𝑙+𝑗 𝑧 𝑙 𝑙 βˆ‘ 𝑦 ⋆ 𝑧 𝑗 = Convolution/cross-correlation

  12. Convolution: Modified Dot Product Around a Point 𝑦 π‘ˆ 𝑧 𝑗 = ෍ 𝑦 𝑙+𝑗 𝑧 𝑙 𝑙 βˆ‘ 𝑦 ⋆ 𝑧 𝑗 = Convolution/cross-correlation

  13. Convolution: Modified Dot Product Around a Point 𝑦 π‘ˆ 𝑧 𝑗 = ෍ 𝑦 𝑙+𝑗 𝑧 𝑙 𝑙 βˆ‘ 𝑦 ⋆ 𝑧 𝑗 = Convolution/cross-correlation

  14. Convolution: Modified Dot Product Around a Point 𝑦 π‘ˆ 𝑧 𝑗 = ෍ 𝑦 𝑙+𝑗 𝑧 𝑙 𝑙 βˆ‘ 𝑦 ⋆ 𝑧 𝑗 = Convolution/cross-correlation

  15. Convolution: Modified Dot Product Around a Point 1-D 𝑦 π‘ˆ 𝑧 𝑗 = ෍ 𝑦 𝑙+𝑗 𝑧 𝑙 convolution 𝑙 input (β€œimage”) βˆ‘ kernel 𝑦 ⋆ 𝑧 = feature map Convolution/cross-correlation

  16. Outline Convolutional Neural Recurrent Neural Networks Networks What is a convolution? Types of recurrence Multidimensional A basic recurrent cell Convolutions BPTT: Backpropagation Typical Convnet Operations through time Deep convnets

  17. 2-D Convolution kernel width : shape of the kernel input (often square) (β€œimage”)

  18. 2-D Convolution stride(s) : how many spaces to move the kernel width : shape of the kernel (often square) input (β€œimage”)

  19. 2-D Convolution stride(s) : how many spaces to move the kernel stride=1 width : shape of the kernel (often square) input (β€œimage”)

  20. 2-D Convolution stride(s) : how many spaces to move the kernel stride=1 width : shape of the kernel (often square) input (β€œimage”)

  21. 2-D Convolution stride(s) : how many spaces to move the kernel stride=1 width : shape of the kernel (often square) input (β€œimage”)

  22. 2-D Convolution stride(s) : how many spaces to move the kernel stride=2 width : shape of the kernel (often square) input (β€œimage”)

  23. 2-D Convolution skip starting here stride(s) : how many spaces to move the kernel stride=2 width : shape of the kernel (often square) input (β€œimage”)

  24. 2-D Convolution skip starting here stride(s) : how many spaces to move the kernel stride=2 width : shape of the kernel (often square) input (β€œimage”)

  25. 2-D Convolution skip starting here stride(s) : how many spaces to move the kernel stride=2 width : shape of the kernel (often square) input (β€œimage”)

  26. 2-D Convolution pad with 0s (one option) stride(s) : how many spaces to move the kernel padding : how to handle input/kernel shape mismatches width : shape of the kernel (often square) input (β€œimage”) β€œsame” : β€œdifferent” : input.shape == output.shape input.shape β‰  output.shape

  27. 2-D Convolution pad with 0s pad with 0s (another option) (another option) stride(s) : how many spaces to move the kernel padding : how to handle input/kernel shape mismatches width : shape of the kernel (often square) input (β€œimage”) β€œsame” : β€œdifferent” : input.shape == output.shape input.shape β‰  output.shape

  28. 2-D Convolution stride(s) : how many spaces to move the kernel padding : how to handle input/kernel shape mismatches width : shape of the kernel (often square) input (β€œimage”) β€œsame” : β€œdifferent” : input.shape == output.shape input.shape β‰  output.shape

  29. From fully connected to convolutional networks image Fully connected layer Slide credit: Svetlana Lazebnik

  30. From fully connected to convolutional networks feature map learned weights image Convolutional layer Slide credit: Svetlana Lazebnik

  31. From fully connected to convolutional networks feature map learned weights image Convolutional layer Slide credit: Svetlana Lazebnik

  32. Convolution as feature extraction Filters/Kernels . . . Feature Map Input Slide credit: Svetlana Lazebnik

  33. From fully connected to convolutional networks feature map learned weights image Convolutional layer Slide credit: Svetlana Lazebnik

  34. From fully connected to convolutional networks non-linearity and/or pooling next layer image Convolutional layer Slide adapted: Svetlana Lazebnik

  35. Outline Convolutional Neural Recurrent Neural Networks Networks What is a convolution? Types of recurrence Multidimensional A basic recurrent cell Convolutions BPTT: Backpropagation Typical Convnet Operations through time Deep convnets Solving vanishing gradients problem

  36. Key operations in a CNN Feature maps Spatial pooling Non-linearity Convolution . (Learned) . . Input Image Feature Map Input Slide credit: Svetlana Lazebnik, R. Fergus, Y. LeCun

  37. Key operations Example: Rectified Linear Unit (ReLU) Feature maps Spatial pooling Non-linearity Convolution (Learned) Input Image Slide credit: Svetlana Lazebnik, R. Fergus, Y. LeCun

  38. Key operations Feature maps Max Spatial pooling Non-linearity Convolution (Learned) Input Image Slide credit: Svetlana Lazebnik, R. Fergus, Y. LeCun

  39. Design principles Reduce filter sizes (except possibly at the lowest layer), factorize filters aggressively Use 1x1 convolutions to reduce and expand the number of feature maps judiciously Use skip connections and/or create multiple paths through the network Slide credit: Svetlana Lazebnik

  40. LeNet-5 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proc. IEEE 86(11): 2278 – 2324, 1998. Slide credit: Svetlana Lazebnik

  41. ImageNet ~14 million labeled images, 20k classes Images gathered from Internet Human labels via Amazon MTurk ImageNet Large-Scale Visual Recognition Challenge (ILSVRC): 1.2 million training images, 1000 classes www.image-net.org/challenges/LSVRC/ Slide credit: Svetlana Lazebnik

  42. Slide credit: Svetlana Lazebnik http://www.inference.vc/deep-learning-is-easy/

  43. Outline Convolutional Neural Recurrent Neural Networks Networks What is a convolution? Types of recurrence Multidimensional A basic recurrent cell Convolutions BPTT: Backpropagation Typical Convnet Operations through time Deep convnets Solving vanishing gradients problem

  44. AlexNet: ILSVRC 2012 winner Similar framework to LeNet but: Max pooling, ReLU nonlinearity More data and bigger model (7 hidden layers, 650K units, 60M params) GPU implementation (50x speedup over CPU): Two GPUs for a week Dropout regularization A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 Slide credit: Svetlana Lazebnik

  45. GoogLeNet Szegedy et al., 2015 Slide credit: Svetlana Lazebnik

  46. GoogLeNet Szegedy et al., 2015 Slide credit: Svetlana Lazebnik

  47. GoogLeNet: Auxiliary Classifier at Sub- levels Idea: try to make each sub-layer good (in its own way) at the prediction task Szegedy et al., 2015 Slide credit: Svetlana Lazebnik

  48. GoogLeNet β€’ An alternative view: Szegedy et al., 2015 Slide credit: Svetlana Lazebnik

Recommend


More recommend