Le Lecture 10 recap ap Prof. Leal-Taixé and Prof. Niessner 1
Le LeNet 60k parameters • Digit recognition: 10 classes • Conv -> Pool -> Conv -> Pool -> Conv -> FC • As we go deeper: Width, height Number of filters Prof. Leal-Taixé and Prof. Niessner 2
[Krizhevsky et al. 2012] Al AlexNe Net • Softmax for 1000 classes Prof. Leal-Taixé and Prof. Niessner 3
VG VGGNet [Simonyan and Zisserman 2014] • Striving for simplicity • CONV = 3x3 filters with stride 1, same convolutions • MAXPOOL = 2x2 filters with stride 2 Prof. Leal-Taixé and Prof. Niessner 4
VG VGGNet Conv=3x3,s=1,same Maxpool=2x2,s=2 Prof. Leal-Taixé and Prof. Niessner 5
VG VGGNet • Conv -> Pool -> Conv -> Pool -> Conv -> FC • As we go deeper: Width, height Number of filters • Called VGG-16: 16 layers that have weights 138M parameters • Large but simplicity makes it appealing Prof. Leal-Taixé and Prof. Niessner 6
The The problem of depth • As we add more and more layers, training becomes harder • Vanishing and exploding gradients • How can we train very deep nets? Prof. Leal-Taixé and Prof. Niessner 7
Re Residual block • Two layers x L x L +1 x L − 1 x L = f ( W L x L − 1 + b L ) W L x L − 1 + b L Input Non-linearity Linear x L +1 = f ( W L +1 x L + b L +1 ) Prof. Leal-Taixé and Prof. Niessner 8
Re Residual block • Two layers x L x L +1 x L − 1 Skip connection Input Linear Linear Main path Prof. Leal-Taixé and Prof. Niessner 9
Re Residual block • Two layers x L x L +1 x L − 1 x L +1 = f ( W L +1 x L + b L +1 + x L − 1 ) Input Linear Linear x L +1 = f ( W L +1 x L + b L +1 ) Prof. Leal-Taixé and Prof. Niessner 10
Re Residual block + • Two layers x L x L +1 x L − 1 • Usually use a same convolution since we need same dimensions • Otherwise we need to convert the dimensions with a matrix of learned weights or zero padding Prof. Leal-Taixé and Prof. Niessner 11
Wh Why do Re ResNets wo work? + x L x L +1 x L − 1 NN • The identity is easy for the residual block to learn • Guaranteed it will not hurt performance, can only improve Prof. Leal-Taixé and Prof. Niessner 12
1x 1x1 1 convoluti tion -5 3 2 -5 3 Image 5x5 4 3 2 1 -3 What is the output size? 1 0 3 3 5 -2 0 1 4 4 5 6 7 9 -1 Kernel 1x1 2 Prof. Leal-Taixé and Prof. Niessner 13
1x 1x1 1 convoluti tion -5 3 2 -5 3 -10 Image 5x5 4 3 2 1 -3 1 0 3 3 5 -2 0 1 4 4 5 6 7 9 -1 Kernel 1x1 2 −5 ∗ 2 = −10 Prof. Leal-Taixé and Prof. Niessner 14
1x 1x1 1 convoluti tion -5 3 2 -5 3 -10 6 4 -10 6 Image 5x5 4 3 2 1 -3 8 6 4 2 -6 1 0 3 3 5 2 0 6 6 10 -2 0 1 4 4 -4 0 2 8 8 5 6 7 9 -1 10 12 14 18 -2 Kernel 1x1 2 −1 ∗ 2 = −2 Prof. Leal-Taixé and Prof. Niessner 15
1x 1x1 1 convoluti tion -5 3 2 -5 3 -10 6 4 -10 6 Image 5x5 4 3 2 1 -3 8 6 4 2 -6 1 0 3 3 5 2 0 6 6 10 -2 0 1 4 4 -4 0 2 8 8 5 6 7 9 -1 10 12 14 18 -2 • For 1 kernel or filter, it keeps the dimensions and just scales the input with a number Prof. Leal-Taixé and Prof. Niessner 16
Us Using 1x1 convolutions • Use it to shrink the number of channels • Further adds a non-linearity à one can learn more complex functions 32 32 32 Conv 1x1x200 + ReLU 32 32 32 200 Prof. Leal-Taixé and Prof. Niessner 17
In Inceptio ion layer • Tired of choosing filter sizes? • Use them all! • All same convolutions • 3x3 max pooling is with stride 1 Prof. Leal-Taixé and Prof. Niessner 18
In Inceptio ion layer: : computatio ional cost 32 32 92 Conv 5x5 16 Conv 1x1 32 + ReLU + ReLU 32 32 32 200 16 92 Multiplications: 1x1x200x32x32x16 5x5x16x32x32x92 ~ 40 million Reduction of multiplications by 1/10 Prof. Leal-Taixé and Prof. Niessner 19
In Inceptio ion layer Prof. Leal-Taixé and Prof. Niessner 20
Se Semant ntic Se Segment ntation n (FCN) [Long et al. 15] Fully Convolutional Networks for Semantic Segmetnation (FCN) Prof. Leal-Taixé and Prof. Niessner 21
Tr Trans nsfer learni ning ng Trained on ImageNet TRAIN New dataset with C classes FROZEN Prof. Leal-Taixé and Prof. Niessner Donahue 2014, Razavian 2014 22
No Now you are: • Ready to perform image classification on any dataset • Ready to design your own architecture • Ready to deal with other problems such as semantic segmentation (Fully Convolutional Network) Prof. Leal-Taixé and Prof. Niessner 23
Re Recurrent Ne Neural Ne Networks Prof. Leal-Taixé and Prof. Niessner 24
RN RNNs are flexi xible Classic Neural Networks for Image Classification Prof. Leal-Taixé and Prof. Niessner 25
RN RNNs are flexi xible Image captioning Prof. Leal-Taixé and Prof. Niessner 26
RN RNNs are flexi xible Language recognition Prof. Leal-Taixé and Prof. Niessner 27
RN RNNs are flexi xible Machine translation Prof. Leal-Taixé and Prof. Niessner 28
RN RNNs are flexi xible Event classification Prof. Leal-Taixé and Prof. Niessner 29
Ba Basic c struct uctur ure of a RN RNN • Multi-layer RNN Outputs Hidden states Inputs Prof. Leal-Taixé and Prof. Niessner 30
Ba Basic c struct uctur ure of a RN RNN • Multi-layer RNN Outputs The hidden state will have its own Hidden internal dynamics states More expressive model! Inputs Prof. Leal-Taixé and Prof. Niessner 31
Ba Basic c struct uctur ure of a RN RNN • We want to have notion of “time” or “sequence” Hidden state Previous input hidden state Prof. Leal-Taixé and Prof. Niessner 32 [Christopher Olah] Understanding LSTMs
Ba Basic c struct uctur ure of a RN RNN • We want to have notion of “time” or “sequence” Hidden state Parameters to be learned Prof. Leal-Taixé and Prof. Niessner 33
Ba Basic c struct uctur ure of a RN RNN • We want to have notion of “time” or “sequence” Output Hidden state Note: non-linearities ignored for now Prof. Leal-Taixé and Prof. Niessner 34
Ba Basic c struct uctur ure of a RN RNN • We want to have notion of “time” or “sequence” Output Hidden state Same parameters for each time step = generalization! Prof. Leal-Taixé and Prof. Niessner 35
Ba Basic c struct uctur ure of a RN RNN • Unrolling RNNs Hidden state is the same Prof. Leal-Taixé and Prof. Niessner 36 [Christopher Olah] Understanding LSTMs
Ba Basic c struct uctur ure of a RN RNN • Unrolling RNNs Prof. Leal-Taixé and Prof. Niessner 37 [Christopher Olah] Understanding LSTMs
Ba Basic c struct uctur ure of a RN RNN • Unrolling RNNs as feedforward nets x t x t +1 x t +2 1 1 1 w 1 w 1 w 1 w 1 w 2 w 2 w 2 w 2 w 4 w 4 w 4 w 4 w 3 w 3 w 3 w 3 x t +1 x t +2 Weights are the same! x t 2 2 2 Prof. Leal-Taixé and Prof. Niessner 38
Ba Back ckprop th through a a RNN • Unrolling RNNs as feedforward nets Chain rule w 1 w 1 w 1 w 1 w 2 w 2 w 2 w 2 w 4 w 4 w 4 w 4 w 3 w 3 w 3 w 3 All the way to t=0 Add the derivatives at different times for each weight Prof. Leal-Taixé and Prof. Niessner 39
Lo Long ng-te term dependencies I mo moved to Germany any … so I speak German an fluently Prof. Leal-Taixé and Prof. Niessner 40
Lo Long ng-te term dependencies • Simple recurrence A t = θ t A 0 • Let us forget the input Same weights are multiplied over and over again Prof. Leal-Taixé and Prof. Niessner 41
Lo Long ng-te term dependencies A t = θ t A 0 • Simple recurrence What happens to small weights? Vanishing gradient What happens to large weights? Exploding gradient Prof. Leal-Taixé and Prof. Niessner 42
Lo Long ng-te term dependencies A t = θ t A 0 • Simple recurrence • If admits eigendecomposition Matrix of Diagonal of this eigenvectors matrix are the eigenvalues Prof. Leal-Taixé and Prof. Niessner 43
Lo Long ng-te term dependencies A t = θ t A 0 • Simple recurrence • If admits eigendecomposition • Orthogonal allows us to simplify the recurrence A t = Q Λ t Q | A 0 Prof. Leal-Taixé and Prof. Niessner 44
Lo Long ng-te term dependencies A t = Q Λ t Q | A 0 • Simple recurrence What happens to eigenvalues with magnitude less than one? Vanishing gradient What happens to eigenvalues with magnitude larger than one? Exploding gradient Gradient clipping Prof. Leal-Taixé and Prof. Niessner 45
Lo Long ng-te term dependencies A t = θ t A 0 • Simple recurrence Let us just make a matrix with eigenvalues = 1 Allow the ce cell to maintain its “state” Prof. Leal-Taixé and Prof. Niessner 46
Va Vanishing gradient A t = θ t A 0 • 1. From the weights • 2. From the activation functions (tanh) Prof. Leal-Taixé and Prof. Niessner 47
Va Vanishing gradient A t = θ t A 0 • 1. From the weights 1 • 2. From the activation functions (tanh) Prof. Leal-Taixé and Prof. Niessner 48
Lo Long ng Sho hort Term Me Memory Prof. Leal-Taixé and Prof. Niessner Hochreiter and Schmidhuber 1997 49
Recommend
More recommend