Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le - PowerPoint PPT Presentation

Le Lecture 10 recap ap Prof. Leal-Taixé and Prof. Niessner 1

Le LeNet 60k parameters • Digit recognition: 10 classes • Conv -> Pool -> Conv -> Pool -> Conv -> FC • As we go deeper: Width, height Number of filters Prof. Leal-Taixé and Prof. Niessner 2

[Krizhevsky et al. 2012] Al AlexNe Net • Softmax for 1000 classes Prof. Leal-Taixé and Prof. Niessner 3

VG VGGNet [Simonyan and Zisserman 2014] • Striving for simplicity • CONV = 3x3 filters with stride 1, same convolutions • MAXPOOL = 2x2 filters with stride 2 Prof. Leal-Taixé and Prof. Niessner 4

VG VGGNet Conv=3x3,s=1,same Maxpool=2x2,s=2 Prof. Leal-Taixé and Prof. Niessner 5

VG VGGNet • Conv -> Pool -> Conv -> Pool -> Conv -> FC • As we go deeper: Width, height Number of filters • Called VGG-16: 16 layers that have weights 138M parameters • Large but simplicity makes it appealing Prof. Leal-Taixé and Prof. Niessner 6

The The problem of depth • As we add more and more layers, training becomes harder • Vanishing and exploding gradients • How can we train very deep nets? Prof. Leal-Taixé and Prof. Niessner 7

Re Residual block • Two layers x L x L +1 x L − 1 x L = f ( W L x L − 1 + b L ) W L x L − 1 + b L Input Non-linearity Linear x L +1 = f ( W L +1 x L + b L +1 ) Prof. Leal-Taixé and Prof. Niessner 8

Re Residual block • Two layers x L x L +1 x L − 1 Skip connection Input Linear Linear Main path Prof. Leal-Taixé and Prof. Niessner 9

Re Residual block • Two layers x L x L +1 x L − 1 x L +1 = f ( W L +1 x L + b L +1 + x L − 1 ) Input Linear Linear x L +1 = f ( W L +1 x L + b L +1 ) Prof. Leal-Taixé and Prof. Niessner 10

Re Residual block + • Two layers x L x L +1 x L − 1 • Usually use a same convolution since we need same dimensions • Otherwise we need to convert the dimensions with a matrix of learned weights or zero padding Prof. Leal-Taixé and Prof. Niessner 11

Wh Why do Re ResNets wo work? + x L x L +1 x L − 1 NN • The identity is easy for the residual block to learn • Guaranteed it will not hurt performance, can only improve Prof. Leal-Taixé and Prof. Niessner 12

1x 1x1 1 convoluti tion -5 3 2 -5 3 Image 5x5 4 3 2 1 -3 What is the output size? 1 0 3 3 5 -2 0 1 4 4 5 6 7 9 -1 Kernel 1x1 2 Prof. Leal-Taixé and Prof. Niessner 13

1x 1x1 1 convoluti tion -5 3 2 -5 3 -10 Image 5x5 4 3 2 1 -3 1 0 3 3 5 -2 0 1 4 4 5 6 7 9 -1 Kernel 1x1 2 −5 ∗ 2 = −10 Prof. Leal-Taixé and Prof. Niessner 14

1x 1x1 1 convoluti tion -5 3 2 -5 3 -10 6 4 -10 6 Image 5x5 4 3 2 1 -3 8 6 4 2 -6 1 0 3 3 5 2 0 6 6 10 -2 0 1 4 4 -4 0 2 8 8 5 6 7 9 -1 10 12 14 18 -2 Kernel 1x1 2 −1 ∗ 2 = −2 Prof. Leal-Taixé and Prof. Niessner 15

1x 1x1 1 convoluti tion -5 3 2 -5 3 -10 6 4 -10 6 Image 5x5 4 3 2 1 -3 8 6 4 2 -6 1 0 3 3 5 2 0 6 6 10 -2 0 1 4 4 -4 0 2 8 8 5 6 7 9 -1 10 12 14 18 -2 • For 1 kernel or filter, it keeps the dimensions and just scales the input with a number Prof. Leal-Taixé and Prof. Niessner 16

Us Using 1x1 convolutions • Use it to shrink the number of channels • Further adds a non-linearity à one can learn more complex functions 32 32 32 Conv 1x1x200 + ReLU 32 32 32 200 Prof. Leal-Taixé and Prof. Niessner 17

In Inceptio ion layer • Tired of choosing filter sizes? • Use them all! • All same convolutions • 3x3 max pooling is with stride 1 Prof. Leal-Taixé and Prof. Niessner 18

In Inceptio ion layer: : computatio ional cost 32 32 92 Conv 5x5 16 Conv 1x1 32 + ReLU + ReLU 32 32 32 200 16 92 Multiplications: 1x1x200x32x32x16 5x5x16x32x32x92 ~ 40 million Reduction of multiplications by 1/10 Prof. Leal-Taixé and Prof. Niessner 19

In Inceptio ion layer Prof. Leal-Taixé and Prof. Niessner 20

Se Semant ntic Se Segment ntation n (FCN) [Long et al. 15] Fully Convolutional Networks for Semantic Segmetnation (FCN) Prof. Leal-Taixé and Prof. Niessner 21

Tr Trans nsfer learni ning ng Trained on ImageNet TRAIN New dataset with C classes FROZEN Prof. Leal-Taixé and Prof. Niessner Donahue 2014, Razavian 2014 22

No Now you are: • Ready to perform image classification on any dataset • Ready to design your own architecture • Ready to deal with other problems such as semantic segmentation (Fully Convolutional Network) Prof. Leal-Taixé and Prof. Niessner 23

Re Recurrent Ne Neural Ne Networks Prof. Leal-Taixé and Prof. Niessner 24

RN RNNs are flexi xible Classic Neural Networks for Image Classification Prof. Leal-Taixé and Prof. Niessner 25

RN RNNs are flexi xible Image captioning Prof. Leal-Taixé and Prof. Niessner 26

RN RNNs are flexi xible Language recognition Prof. Leal-Taixé and Prof. Niessner 27

RN RNNs are flexi xible Machine translation Prof. Leal-Taixé and Prof. Niessner 28

RN RNNs are flexi xible Event classification Prof. Leal-Taixé and Prof. Niessner 29

Ba Basic c struct uctur ure of a RN RNN • Multi-layer RNN Outputs Hidden states Inputs Prof. Leal-Taixé and Prof. Niessner 30

Ba Basic c struct uctur ure of a RN RNN • Multi-layer RNN Outputs The hidden state will have its own Hidden internal dynamics states More expressive model! Inputs Prof. Leal-Taixé and Prof. Niessner 31

Ba Basic c struct uctur ure of a RN RNN • We want to have notion of “time” or “sequence” Hidden state Previous input hidden state Prof. Leal-Taixé and Prof. Niessner 32 [Christopher Olah] Understanding LSTMs

Ba Basic c struct uctur ure of a RN RNN • We want to have notion of “time” or “sequence” Hidden state Parameters to be learned Prof. Leal-Taixé and Prof. Niessner 33

Ba Basic c struct uctur ure of a RN RNN • We want to have notion of “time” or “sequence” Output Hidden state Note: non-linearities ignored for now Prof. Leal-Taixé and Prof. Niessner 34

Ba Basic c struct uctur ure of a RN RNN • We want to have notion of “time” or “sequence” Output Hidden state Same parameters for each time step = generalization! Prof. Leal-Taixé and Prof. Niessner 35

Ba Basic c struct uctur ure of a RN RNN • Unrolling RNNs Hidden state is the same Prof. Leal-Taixé and Prof. Niessner 36 [Christopher Olah] Understanding LSTMs

Ba Basic c struct uctur ure of a RN RNN • Unrolling RNNs Prof. Leal-Taixé and Prof. Niessner 37 [Christopher Olah] Understanding LSTMs

Ba Basic c struct uctur ure of a RN RNN • Unrolling RNNs as feedforward nets x t x t +1 x t +2 1 1 1 w 1 w 1 w 1 w 1 w 2 w 2 w 2 w 2 w 4 w 4 w 4 w 4 w 3 w 3 w 3 w 3 x t +1 x t +2 Weights are the same! x t 2 2 2 Prof. Leal-Taixé and Prof. Niessner 38

Ba Back ckprop th through a a RNN • Unrolling RNNs as feedforward nets Chain rule w 1 w 1 w 1 w 1 w 2 w 2 w 2 w 2 w 4 w 4 w 4 w 4 w 3 w 3 w 3 w 3 All the way to t=0 Add the derivatives at different times for each weight Prof. Leal-Taixé and Prof. Niessner 39

Lo Long ng-te term dependencies I mo moved to Germany any … so I speak German an fluently Prof. Leal-Taixé and Prof. Niessner 40

Lo Long ng-te term dependencies • Simple recurrence A t = θ t A 0 • Let us forget the input Same weights are multiplied over and over again Prof. Leal-Taixé and Prof. Niessner 41

Lo Long ng-te term dependencies A t = θ t A 0 • Simple recurrence What happens to small weights? Vanishing gradient What happens to large weights? Exploding gradient Prof. Leal-Taixé and Prof. Niessner 42

Lo Long ng-te term dependencies A t = θ t A 0 • Simple recurrence • If admits eigendecomposition Matrix of Diagonal of this eigenvectors matrix are the eigenvalues Prof. Leal-Taixé and Prof. Niessner 43

Lo Long ng-te term dependencies A t = θ t A 0 • Simple recurrence • If admits eigendecomposition • Orthogonal allows us to simplify the recurrence A t = Q Λ t Q | A 0 Prof. Leal-Taixé and Prof. Niessner 44

Lo Long ng-te term dependencies A t = Q Λ t Q | A 0 • Simple recurrence What happens to eigenvalues with magnitude less than one? Vanishing gradient What happens to eigenvalues with magnitude larger than one? Exploding gradient Gradient clipping Prof. Leal-Taixé and Prof. Niessner 45

Lo Long ng-te term dependencies A t = θ t A 0 • Simple recurrence Let us just make a matrix with eigenvalues = 1 Allow the ce cell to maintain its “state” Prof. Leal-Taixé and Prof. Niessner 46

Va Vanishing gradient A t = θ t A 0 • 1. From the weights • 2. From the activation functions (tanh) Prof. Leal-Taixé and Prof. Niessner 47

Va Vanishing gradient A t = θ t A 0 • 1. From the weights 1 • 2. From the activation functions (tanh) Prof. Leal-Taixé and Prof. Niessner 48

Lo Long ng Sho hort Term Me Memory Prof. Leal-Taixé and Prof. Niessner Hochreiter and Schmidhuber 1997 49

Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le - PowerPoint PPT Presentation

Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le LeNet 60k parameters Digit recognition: 10 classes Conv -> Pool -> Conv -> Pool -> Conv -> FC As we go deeper: Width, height Number of filters

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2015 - Lecture 13: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2016 - Lecture 14: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2018

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2017

Semiotics: Recap Examples References Jrg Cassens Data and Process Visualization SoSe 2017

Access Methods 1 / 44 Recap Recap 2 / 44 Recap A More Detailed Architecture granularity:

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Trees (Part 1) 1 / 57 Trees (Part 1) Recap Recap 2 / 57 Trees (Part 1) Recap Hash Tables

Proof of Stake Recap Bitcoin Incentives Block subsidy Transaction fees Recap

Ruby Monstas Session 14 Agenda Recap Standard Library: RSS Exercises Recap Recap: TodoList

61A Lecture 11 Friday, September 21 Midterm 1 Recap 2 Midterm 1 Recap The exam was more

Interactive Proofs Lecture 16 What the all-powerful can convince mere mortals of 1 Recap 2

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Welcome! Todays Agenda: Now What TOTAL RECAP The Process / Digest Grand Recap

Learning Transferable Architectures for Scalable Image Recognition - Barret Zoph, Vijay Vasudevan,

GSM privacy attacks Karsten Nohl, nohl@srlabs.de Karsten Nohl, nohl@srlabs.de Agenda GSM

Deep learning 4.5. Pooling Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 The

Internet Software Technologies I t t S ft T h l i HTML HTML IMCNE IMCNE A.A. 2008/09

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

An Open Interface for Hooking Solvers to Modeling Systems Part 2: The Modeling System Interface

Congestion Management for Non-Blocking Clos Networks Nikos Chrysos Inst. of Computer Science

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le - PowerPoint PPT Presentation

Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le LeNet 60k parameters Digit recognition: 10 classes Conv -> Pool -> Conv -> Pool -> Conv -> FC As we go deeper: Width, height Number of filters

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2015 - Lecture 13: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2016 - Lecture 14: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker &amp; Debabrata Panja - April-July 2018

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker &amp; Debabrata Panja - April-July 2017

Semiotics: Recap Examples References Jrg Cassens Data and Process Visualization SoSe 2017

Access Methods 1 / 44 Recap Recap 2 / 44 Recap A More Detailed Architecture granularity:

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Trees (Part 1) 1 / 57 Trees (Part 1) Recap Recap 2 / 57 Trees (Part 1) Recap Hash Tables

Proof of Stake Recap Bitcoin Incentives Block subsidy Transaction fees Recap

Ruby Monstas Session 14 Agenda Recap Standard Library: RSS Exercises Recap Recap: TodoList

61A Lecture 11 Friday, September 21 Midterm 1 Recap 2 Midterm 1 Recap The exam was more

Interactive Proofs Lecture 16 What the all-powerful can convince mere mortals of 1 Recap 2

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Welcome! Todays Agenda: Now What TOTAL RECAP The Process / Digest Grand Recap

Learning Transferable Architectures for Scalable Image Recognition - Barret Zoph, Vijay Vasudevan,

GSM privacy attacks Karsten Nohl, nohl@srlabs.de Karsten Nohl, nohl@srlabs.de Agenda GSM

Deep learning 4.5. Pooling Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 The

Internet Software Technologies I t t S ft T h l i HTML HTML IMCNE IMCNE A.A. 2008/09

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

An Open Interface for Hooking Solvers to Modeling Systems Part 2: The Modeling System Interface

Congestion Management for Non-Blocking Clos Networks Nikos Chrysos Inst. of Computer Science

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2018

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2017