Le Lecture 9 9 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taixé 1
What Wh at ar are e Con onvol olution ons? " 𝑔 ∗ = % 𝑔 𝜐 𝑢 − 𝜐 𝑒𝜐 !" 𝑔 = red = blue 𝑔 ∗ = green Convolution of two box functions Convolution of two Gaussians application of a filter to a function the ‘smaller’ one is typically called the filter kernel I2DL: Prof. Niessner, Prof. Leal-Taixé 2
Wh What at ar are e Con onvol olution ons? Discrete case: box filter 4 3 2 -5 3 5 2 5 5 6 1/3 1/ 1/ 1/3 1/3 1/ ?? ?? 3 0 0 1 10 10/3 4 4 16/3 16 ?? ?? Wh What t to d do a at b boundaries? 1) Shrink 3 0 0 1 10/3 10 4 4 16 16/3 2) Pad 7/3 7/3 3 0 0 1 10 10/3 4 4 16 16/3 11/ 11/3 often ‘0’ I2DL: Prof. Niessner, Prof. Leal-Taixé 3
Convolutions on Im Images -5 3 2 -5 3 Image 5x5 4 3 2 1 -3 1 0 3 3 5 -2 0 1 4 4 Output 3x3 6 1 8 5 6 7 9 -1 -7 9 2 -5 -9 3 Kernel 3x3 0 -1 0 -1 5 -1 5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1 = 20 − 17 = 3 0 -1 0 I2DL: Prof. Niessner, Prof. Leal-Taixé 4
Im Image Filters • Each kernel gives us a different image filter N THESE FILTERS! Box mean Edge detection 1 1 1 −1 −1 −1 1 Input 1 1 1 −1 8 −1 9 1 1 1 −1 −1 −1 ARN LET’S LEAR Sharpen Gaussian blur 0 −1 0 1 2 1 1 −1 5 −1 2 4 2 16 0 −1 0 1 2 1 I2DL: Prof. Niessner, Prof. Leal-Taixé 5
Convolutions on RGB Im Images 32×32×3 image (pixels 𝑦 ) activation map 5×5×3 filter (weights 𝑥 ) (also feature map) Co Convolve 28 32 5 5 slide over all spatial locations 𝑦 # 3 and compute all output 𝑨 # ; 28 w/o padding, there are 32 1 28×28 locations 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 6
Convol Con olution ion Layer er 32×32×3 image 5×5×3 filter activation maps Convolve Co 28 32 5 5 3 Le Let’s s apply y a different filter wi with th different t we weights ts! 28 1 32 1 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 7
Con Convol olution ion Layer er 32×32×3 image Convolution “Layer” Co activation maps Convolve Co 32 28 Let’s Le s apply y **fi **five** ** fi filters, ea each wit ith dif iffer eren ent weigh eights! 28 32 5 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 8
Con Convol olution ion Layer ers: Dimen imension ions In Input ut widt dth of N ht height Input: 𝑂×𝑂 Filter: 𝐺×𝐺 Filter he ut height of N Stride: 𝑇 $!% $!% of F Output: ( + 1)×( + 1) of Fi & & Fi Filter width h of F of Input '!( 𝑂 = 7, 𝐺 = 3, 𝑇 = 1: ) + 1 = 5 In '!( 𝑂 = 7, 𝐺 = 3, 𝑇 = 2: * + 1 = 3 '!( 𝑂 = 7, 𝐺 = 3, 𝑇 = 3: ( + 1 = 2.3333 Fractions are illegal I2DL: Prof. Niessner, Prof. Leal-Taixé 9
Con Convol olution ion Layer ers: Paddin ing Types of convolutions: 0 0 0 0 0 0 0 0 0 Image 7x7 + zero padding 0 0 0 0 • Val alid convolution: using 0 0 no padding 0 0 0 0 • Sam ame convolution: 0 0 output=input size 0 0 0 0 0 0 0 0 0 0 0 %!) Set padding to 𝑄 = * I2DL: Prof. Niessner, Prof. Leal-Taixé 10
Con Convol olution ion La Layers: : Di Dimensions KLM⋅NOP KLM⋅NOP Output = Remember: Ou + 1 × + 1 Q Q ion is used RE REMARK RK : in practice, typically in integer div ivis ision (i.e., apply the fl floor–op operator or !) Example: 3x3 conv with same padding and strides of 2 on an 64x64 RGB image -> N = 64, F = 3, P = 1, S = 2 !"#$⋅&'( !"#$⋅&'( Output: + 1 × + 1 $ $ = 𝑔𝑚𝑝𝑝𝑠 32.5 × 𝑔𝑚𝑝𝑝𝑠 32.5 = 32× 32 I2DL: Prof. Niessner, Prof. Leal-Taixé 11
CN CNN Lea earned ed Fil ilter ers I2DL: Prof. Niessner, Prof. Leal-Taixé 12
CN CNN Prot otot otype Slide by Karpathy I2DL: Prof. Niessner, Prof. Leal-Taixé 13
Pooli Po ling g Laye yer: Max x Po Pooli ling Single depth slice of input ‘Pooled’ output 3 1 3 5 Max pool with 2×2 filters and stride 2 6 9 6 0 7 9 3 4 3 2 1 4 0 2 4 3 I2DL: Prof. Niessner, Prof. Leal-Taixé 14
Rec Recep eptive e Fiel eld • Spatial extent of the connectivity of a convolutional filter 3x3 output 5x5 receptive field on the original input: 7x7 input one output value is connected to 25 input pixels I2DL: Prof. Niessner, Prof. Leal-Taixé 15
Le Lecture 10 10 – CNNs CNNs (p (part 2) 2) I2DL: Prof. Niessner, Prof. Leal-Taixé 16
Cla lassic Architectures I2DL: Prof. Niessner, Prof. Leal-Taixé 17
Le LeNet • Digit recognition: 10 classes Input: 32 × 32 grayscale images This one: Labeled as class “7” [LeCun et al. ’98] LeNet I2DL: Prof. Niessner, Prof. Leal-Taixé 18
Le LeNet • Digit recognition: 10 classes • Valid convolution: size shrinks • How many conv filters are there in the first layer? 6 I2DL: Prof. Niessner, Prof. Leal-Taixé 19
Le LeNet • Digit recognition: 10 classes • At that time average pooling was used, now max pooling is much more common I2DL: Prof. Niessner, Prof. Leal-Taixé 20
Le LeNet • Digit recognition: 10 classes • Again valid convolutions, how many filters? I2DL: Prof. Niessner, Prof. Leal-Taixé 21
Le LeNet • Digit recognition: 10 classes • Use of tanh/sigmoid activations à not common now! I2DL: Prof. Niessner, Prof. Leal-Taixé 22
Le LeNet • Digit recognition: 10 classes • Conv -> Pool -> Conv -> Pool -> Conv -> FC I2DL: Prof. Niessner, Prof. Leal-Taixé 23
Le LeNet 60k parameters • Digit recognition: 10 classes • Conv -> Pool -> Conv -> Pool -> Conv -> FC • As we go deeper: Width, Height Number of Filters I2DL: Prof. Niessner, Prof. Leal-Taixé 24
Te Test t Benchm nchmarks ks Net Dataset: • Imag ageNe ImageNet Large Scale Visual Recognition Competition (ILSVRC) [Russakovsky et al., IJCV’15] “ImageNet Large Scale Visual Recognition Challenge.“ I2DL: Prof. Niessner, Prof. Leal-Taixé 25
Common Per Common erfor orma mance e Met etric ics • To Top-1 1 score: check if a sample’s top class (i.e. the one with highest probability) is the same as its target label • To Top-5 s 5 scor ore: e: check if your label is in your 5 first predictions (i.e. predictions with 5 highest probabilities) • → To Top-5 er 5 error or : percentage of test samples for which the correct class was not in the top 5 predicted classes I2DL: Prof. Niessner, Prof. Leal-Taixé 26
Al AlexNet • Cut ImageNet error down in half Non-CNN CNN I2DL: Prof. Niessner, Prof. Leal-Taixé 27
Al AlexNet [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 28
Al AlexNet • First filter with stride 4 to reduce size significantly • 96 filters [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 29
Al AlexNet • Use of same convolutions • Use of same convolutions • As with LeNet: Width, Height Number of Filters • As with LeNet, Width, height Number of filters [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 30
Al AlexNet [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 31
Al AlexNet • Softmax for 1000 classes [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 32
Al AlexNet • Similar to LeNet but much bigger (~1000 times) • Use of ReLU instead of tanh/sigmoid 60M parameters [Krizhevsky et al. NIPS’12] AlexNet I2DL: Prof. Niessner, Prof. Leal-Taixé 33
VG VGGNet et • Striving for simplicity • CONV = 3x3 filters with stride 1, same convolutions • MAXPOOL = 2x2 filters with stride 2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 34
VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 • 2 consecutive convolutional layers, each one with 64 filters • What is the output size? [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 35
VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 36
VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 37
VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 • Number of filters is multiplied by 2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 38
VG VGGNet et Conv=3x3,s=1,same Maxpool=2x2,s=2 [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 39
VG VGGNet et • Conv -> Pool -> Conv -> Pool -> Conv -> FC • As we go deeper: Width, Height Number of Filters • Called VGG-16: 16 layers that have weights 138M parameters • Large but simplicity makes it appealing [Simonyan and Zisserman ICLR’15] VGGNet I2DL: Prof. Niessner, Prof. Leal-Taixé 40
VG VGGNet et • A lot of architectures were analyzed [Simonyan and Zisserman 2014] I2DL: Prof. Niessner, Prof. Leal-Taixé 41
Sk Skip p Con Connecti ection ons I2DL: Prof. Niessner, Prof. Leal-Taixé 42
Recommend
More recommend