le lecture 9 9 r recap ap
play

Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix - PowerPoint PPT Presentation

Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 What Wh at ar are e Con onvol olution ons? " = % !" = red = blue = green


  1. Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taixé 1

  2. What Wh at ar are e Con onvol olution ons? " 𝑔 ∗ 𝑕 = % 𝑔 𝜐 𝑕 𝑢 − 𝜐 𝑒𝜐 !" 𝑔 = red 𝑕 = blue 𝑔 ∗ 𝑕 = green Convolution of two box functions Convolution of two Gaussians application of a filter to a function the ‘smaller’ one is typically called the filter kernel I2DL: Prof. Niessner, Prof. Leal-Taixé 2

  3. Wh What at ar are e Con onvol olution ons? Discrete case: box filter 4 3 2 -5 3 5 2 5 5 6 1/3 1/ 1/ 1/3 1/3 1/ ?? ?? 3 0 0 1 10 10/3 4 4 16/3 16 ?? ?? Wh What t to d do a at b boundaries? 1) Shrink 3 0 0 1 10/3 10 4 4 16 16/3 2) Pad 7/3 7/3 3 0 0 1 10 10/3 4 4 16 16/3 11/ 11/3 often ‘0’ I2DL: Prof. Niessner, Prof. Leal-Taixé 3

  4. Convolutions on Im Images -5 3 2 -5 3 Image 5x5 4 3 2 1 -3 1 0 3 3 5 -2 0 1 4 4 Output 3x3 6 1 8 5 6 7 9 -1 -7 9 2 -5 -9 3 Kernel 3x3 0 -1 0 -1 5 -1 5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1 = 20 − 17 = 3 0 -1 0 I2DL: Prof. Niessner, Prof. Leal-Taixé 4

  5. Im Image Filters • Each kernel gives us a different image filter N THESE FILTERS! Box mean Edge detection 1 1 1 −1 −1 −1 1 Input 1 1 1 −1 8 −1 9 1 1 1 −1 −1 −1 ARN LET’S LEAR Sharpen Gaussian blur 0 −1 0 1 2 1 1 −1 5 −1 2 4 2 16 0 −1 0 1 2 1 I2DL: Prof. Niessner, Prof. Leal-Taixé 5

  6. Convolutions on RGB Im Images 32×32×3 image (pixels 𝑦 ) activation map 5×5×3 filter (weights 𝑥 ) (also feature map) Co Convolve 28 32 5 5 slide over all spatial locations 𝑦 # 3 and compute all output 𝑨 # ; 28 w/o padding, there are 32 1 28×28 locations 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 6

  7. Convol Con olution ion Layer er 32×32×3 image 5×5×3 filter activation maps Convolve Co 28 32 5 5 3 Le Let’s s apply y a different filter wi with th different t we weights ts! 28 1 32 1 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 7

  8. Con Convol olution ion Layer er 32×32×3 image Convolution “Layer” Co activation maps Convolve Co 32 28 Let’s Le s apply y **fi **five** ** fi filters, ea each wit ith dif iffer eren ent weigh eights! 28 32 5 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 8

  9. Con Convol olution ion Layer ers: Dimen imension ions In Input ut widt dth of N ht height Input: 𝑂×𝑂 Filter: 𝐺×𝐺 Filter he ut height of N Stride: 𝑇 $!% $!% of F Output: ( + 1)×( + 1) of Fi & & Fi Filter width h of F of Input '!( 𝑂 = 7, 𝐺 = 3, 𝑇 = 1: ) + 1 = 5 In '!( 𝑂 = 7, 𝐺 = 3, 𝑇 = 2: * + 1 = 3 '!( 𝑂 = 7, 𝐺 = 3, 𝑇 = 3: ( + 1 = 2.3333 Fractions are illegal I2DL: Prof. Niessner, Prof. Leal-Taixé 9

  10. Con Convol olution ion Layer ers: Paddin ing Types of convolutions: 0 0 0 0 0 0 0 0 0 Image 7x7 + zero padding 0 0 0 0 • Val alid convolution: using 0 0 no padding 0 0 0 0 • Sam ame convolution: 0 0 output=input size 0 0 0 0 0 0 0 0 0 0 0 %!) Set padding to 𝑄 = * I2DL: Prof. Niessner, Prof. Leal-Taixé 10

  11. Con Convol olution ion La Layers: : Di Dimensions KLM⋅NOP KLM⋅NOP Output = Remember: Ou + 1 × + 1 Q Q ion is used RE REMARK RK : in practice, typically in integer div ivis ision (i.e., apply the fl floor–op operator or !) Example: 3x3 conv with same padding and strides of 2 on an 64x64 RGB image -> N = 64, F = 3, P = 1, S = 2 !"#$⋅&'( !"#$⋅&'( Output: + 1 × + 1 $ $ = 𝑔𝑚𝑝𝑝𝑠 32.5 × 𝑔𝑚𝑝𝑝𝑠 32.5 = 32× 32 I2DL: Prof. Niessner, Prof. Leal-Taixé 11

  12. CN CNN Lea earned ed Fil ilter ers I2DL: Prof. Niessner, Prof. Leal-Taixé 12

  13. CN CNN Prot otot otype Slide by Karpathy I2DL: Prof. Niessner, Prof. Leal-Taixé 13

  14. Pooli Po ling g Laye yer: Max x Po Pooli ling Single depth slice of input ‘Pooled’ output 3 1 3 5 Max pool with 2×2 filters and stride 2 6 9 6 0 7 9 3 4 3 2 1 4 0 2 4 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 14

  15. Rec Recep eptive e Fiel eld • Spatial extent of the connectivity of a convolutional filter 3x3 output 5x5 receptive field on the original input: 7x7 input one output value is connected to 25 input pixels I2DL: Prof. Niessner, Prof. Leal-Taixé 15

  16. Le Lecture 10 10 – CNNs CNNs (p (part 2) 2) I2DL: Prof. Niessner, Prof. Leal-Taixé 16

  17. Cla lassic Architectures I2DL: Prof. Niessner, Prof. Leal-Taixé 17

  18. Le LeNet • Digit recognition: 10 classes Input: 32 × 32 grayscale images This one: Labeled as class “7” [LeCun et al. ’98] LeNet I2DL: Prof. Niessner, Prof. Leal-Taixé 18

  19. Le LeNet • Digit recognition: 10 classes • Valid convolution: size shrinks • How many conv filters are there in the first layer? 6 I2DL: Prof. Niessner, Prof. Leal-Taixé 19

  20. Le LeNet • Digit recognition: 10 classes • At that time average pooling was used, now max pooling is much more common I2DL: Prof. Niessner, Prof. Leal-Taixé 20

  21. Le LeNet • Digit recognition: 10 classes • Again valid convolutions, how many filters? I2DL: Prof. Niessner, Prof. Leal-Taixé 21

  22. Le LeNet • Digit recognition: 10 classes • Use of tanh/sigmoid activations à not common now! I2DL: Prof. Niessner, Prof. Leal-Taixé 22

  23. Le LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv -> Pool -> Conv -> FC I2DL: Prof. Niessner, Prof. Leal-Taixé 23

  24. Le LeNet 60k parameters • Digit recognition: 10 classes • Conv -> Pool -> Conv -> Pool -> Conv -> FC • As we go deeper: Width, Height Number of Filters I2DL: Prof. Niessner, Prof. Leal-Taixé 24

  25. Te Test t Benchm nchmarks ks Net Dataset: • Imag ageNe ImageNet Large Scale Visual Recognition Competition (ILSVRC) [Russakovsky et al., IJCV’15] “ImageNet Large Scale Visual Recognition Challenge.“ I2DL: Prof. Niessner, Prof. Leal-Taixé 25

  26. Common Per Common erfor orma mance e Met etric ics • To Top-1 1 score: check if a sample’s top class (i.e. the one with highest probability) is the same as its target label • To Top-5 s 5 scor ore: e: check if your label is in your 5 first predictions (i.e. predictions with 5 highest probabilities) • → To Top-5 er 5 error or : percentage of test samples for which the correct class was not in the top 5 predicted classes I2DL: Prof. Niessner, Prof. Leal-Taixé 26

  27. Al AlexNet • Cut ImageNet error down in half Non-CNN CNN I2DL: Prof. Niessner, Prof. Leal-Taixé 27

  28. Al AlexNet [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 28

  29. Al AlexNet • First filter with stride 4 to reduce size significantly • 96 filters [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 29

  30. Al AlexNet • Use of same convolutions • Use of same convolutions • As with LeNet: Width, Height Number of Filters • As with LeNet, Width, height Number of filters [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 30

  31. Al AlexNet [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 31

  32. Al AlexNet • Softmax for 1000 classes [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 32

  33. Al AlexNet • Similar to LeNet but much bigger (~1000 times) • Use of ReLU instead of tanh/sigmoid 60M parameters [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 33

  34. VG VGGNet et • Striving for simplicity • CONV = 3x3 filters with stride 1, same convolutions • MAXPOOL = 2x2 filters with stride 2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 34

  35. VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 • 2 consecutive convolutional layers, each one with 64 filters • What is the output size? [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 35

  36. VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 36

  37. VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 37

  38. VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 • Number of filters is multiplied by 2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 38

  39. VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 39

  40. VG VGGNet et • Conv -> Pool -> Conv -> Pool -> Conv -> FC • As we go deeper: Width, Height Number of Filters • Called VGG-16: 16 layers that have weights 138M parameters • Large but simplicity makes it appealing [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 40

  41. VG VGGNet et • A lot of architectures were analyzed [Simonyan and Zisserman 2014] I2DL: Prof. Niessner, Prof. Leal-Taixé 41

  42. Sk Skip p Con Connecti ection ons I2DL: Prof. Niessner, Prof. Leal-Taixé 42

Recommend


More recommend