cs7015 deep learning lecture 11
play

CS7015 (Deep Learning) : Lecture 11 Convolutional Neural Networks, - PowerPoint PPT Presentation

CS7015 (Deep Learning) : Lecture 11 Convolutional Neural Networks, LeNet, AlexNet, ZF-Net, VGGNet, GoogLeNet and ResNet Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/1 Mitesh M. Khapra


  1. CS7015 (Deep Learning) : Lecture 11 Convolutional Neural Networks, LeNet, AlexNet, ZF-Net, VGGNet, GoogLeNet and ResNet Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras 1/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  2. Module 11.1 : The convolution operation 2/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  3. Suppose we are tracking the position of an aeroplane using a laser sensor at discrete time intervals Now suppose our sensor is noisy x 0 x 1 x 2 To obtain a less noisy estimate we would like to average several measure- ∞ ments � s t = x t − a w − a = ( x ∗ w ) t More recent measurements are more a =0 important so we would like to take a weighted average input filter convolution 3/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  4. In practice, we would only sum over a 6 small window � s t = x t − a w − a The weight array ( w ) is known as the a =0 filter We just slide the filter over the input and compute the value of s t based on a win- dow around x t w − 6 w − 5 w − 4 w − 3 w − 2 w − 1 w 0 Here the input (and the kernel) is one W 0.01 0.01 0.02 0.02 0.04 0.4 0.5 dimensional Can we use a convolutional operation on X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70 a 2D input also? S 0.00 1.80 0.00 0.00 0.00 0.00 0.00 s 6 = x 6 w 0 + x 5 w − 1 + x 4 w − 2 + x 3 w − 3 + x 2 w − 4 + x 1 w − 5 + x 0 w − 6 4/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  5. In practice, we would only sum over a 6 small window � s t = x t − a w − a The weight array ( w ) is known as the a =0 filter We just slide the filter over the input and compute the value of s t based on a win- dow around x t w − 6 w − 5 w − 4 w − 3 w − 2 w − 1 w 0 Here the input (and the kernel) is one W 0.01 0.01 0.02 0.02 0.04 0.4 0.5 dimensional Can we use a convolutional operation on X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70 a 2D input also? S 0.00 1.80 1.96 0.00 0.00 0.00 0.00 s 6 = x 6 w 0 + x 5 w − 1 + x 4 w − 2 + x 3 w − 3 + x 2 w − 4 + x 1 w − 5 + x 0 w − 6 5/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  6. In practice, we would only sum over a 6 small window � s t = x t − a w − a The weight array ( w ) is known as the a =0 filter We just slide the filter over the input and compute the value of s t based on a win- dow around x t w − 6 w − 5 w − 4 w − 3 w − 2 w − 1 w 0 Here the input (and the kernel) is one W 0.01 0.01 0.02 0.02 0.04 0.4 0.5 dimensional Can we use a convolutional operation on X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70 a 2D input also? S 0.00 1.80 1.96 2.11 0.00 0.00 0.00 s 6 = x 6 w 0 + x 5 w − 1 + x 4 w − 2 + x 3 w − 3 + x 2 w − 4 + x 1 w − 5 + x 0 w − 6 6/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  7. In practice, we would only sum over a 6 small window � s t = x t − a w − a The weight array ( w ) is known as the a =0 filter We just slide the filter over the input and compute the value of s t based on a win- dow around x t w − 6 w − 5 w − 4 w − 3 w − 2 w − 1 w 0 Here the input (and the kernel) is one W 0.01 0.01 0.02 0.02 0.04 0.4 0.5 dimensional Can we use a convolutional operation on X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70 a 2D input also? S 0.00 1.80 1.96 2.11 2.16 2.28 0.00 s 6 = x 6 w 0 + x 5 w − 1 + x 4 w − 2 + x 3 w − 3 + x 2 w − 4 + x 1 w − 5 + x 0 w − 6 7/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  8. In practice, we would only sum over a 6 small window � s t = x t − a w − a The weight array ( w ) is known as the a =0 filter We just slide the filter over the input and compute the value of s t based on a win- dow around x t w − 6 w − 5 w − 4 w − 3 w − 2 w − 1 w 0 Here the input (and the kernel) is one W 0.01 0.01 0.02 0.02 0.04 0.4 0.5 dimensional Can we use a convolutional operation on X 1.00 1.10 1.20 1.40 1.70 1.80 1.90 2.10 2.20 2.40 2.50 2.70 a 2D input also? S 0.00 1.80 1.96 2.11 2.16 2.28 2.42 s 6 = x 6 w 0 + x 5 w − 1 + x 4 w − 2 + x 3 w − 3 + x 2 w − 4 + x 1 w − 5 + x 0 w − 6 8/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  9. We can think of images as 2D inputs We would now like to use a 2D filter ( m × n ) First let us see what the 2D formula looks like This formula looks at all the preced- ing neighbours ( i − a, j − b ) In practice, we use the following for- mula which looks at the succeeding m − 1 n − 1 neighbours � � S ij = ( I ∗ K ) ij = I i − a,j − b K a,b I i + a,j + b K a,b a =0 b =0 9/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  10. Let us apply this idea to a toy ex- Input ample and see the results Kernel a c b d w x g e f h y z j i k ℓ Output aw+bx+ey+fz bw+cx+fy+gz cw+dx+gy+hz ew+fx+iy+jz fw+gx+jy+kz gw+hx+ky+ ℓ z 10/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  11. For the rest of the discussion we will ⌊ m 2 ⌋ ⌊ n 2 ⌋ use the following formula for convolu- � � S ij = ( I ∗ K ) ij = I i − a,j − b K m 2 + a, n 2 + b tion a = ⌊ − m 2 ⌋ b = ⌊ − n 2 ⌋ In other words we will assume that the kernel is centered on the pixel of interest pixel of interest So we will be looking at both preceed- ing and succeeding neighbors 11/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  12. Let us see some examples of 2D convolutions applied to images 12/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  13. 1 1 1 1 1 1 = ∗ 1 1 1 blurs the image 13/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  14. 0 -1 0 -1 5 -1 = ∗ 0 -1 0 sharpens the image 14/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  15. 1 1 1 1 -8 1 = ∗ 1 1 1 detects the edges 15/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  16. We will now see a working example of 2D convolution. 16/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  17. We just slide the kernel over the input image Each time we slide the kernel we get one value in the output The resulting output is called a fea- ture map. We can use multiple filters to get mul- tiple feature maps. 17/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  18. a c b d Question g e f h In the 1D case, we slide a one dimensional filter over a one dimensional input j i k l In the 2D case, we slide a two dimen- stional filter over a two dimensional out- A B C B A B C put What would happen in the 3D case? 18/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  19. R G B What would a 3D filter look like? It will be 3D and we will refer to it as a volume Once again we will slide the volume over the 3D input and compute the convolution oper- ation Note that in this lecture we will assume that the filter always extends to the depth of the image filter In effect, we are doing a 2D convolution oper- ation on a 3D input (because the filter moves along the height and the width but not along the depth) As a result the output will be 2D (only width and height, no depth) OUTPUT INPUT Once again we can apply multiple filters to get multiple feature maps 19/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  20. Module 11.2 : Relation between input size, output size and filter size 20/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  21. So far we have not said anything explicit about the dimensions of the 1 inputs 2 filters 3 outputs and the relations between them We will see how they are related but before that we will define a few quantities 21/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  22. We first define the following quantit- ies Width ( W 1 ), Height ( H 1 ) and Depth F ( D 1 ) of the original input The Stride S (We will come back to F H 2 this later) D 1 The number of filters K The spatial extent ( F ) of each filter H 1 (the depth of each filter is same as the depth of each input) The output is W 2 × H 2 × D 2 (we will soon see a formula for computing W 2 , W 2 W 1 H 2 and D 2 ) D 2 D 1 22/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  23. Let us compute the dimension ( W 2 , H 2 ) of the output Notice that we can’t place the kernel at the corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input 23/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

  24. Let us compute the dimension ( W 2 , H 2 ) of the output Notice that we can’t place the kernel at the corners as it will cross the input boundary This is true for all the shaded points (the kernel crosses the input boundary) This results in an output which is of smaller dimensions than the input As the size of the kernel increases, this be- In general, W 2 = W 1 − F + 1 comes true for even more pixels H 2 = H 1 − F + 1 For example, let’s consider a 5 × 5 kernel We have an even smaller output now We will refine this formula further 24/1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 11

Recommend


More recommend