Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix - PowerPoint PPT Presentation

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taixé 1

Na Naïve L Losse sses: s: L L2 v vs L s L1 • L2 Loss: • L1 Loss: – 𝑀 $ = ∑ "#$ % – 𝑀 ! = ∑ "#$ ! |𝑧 " − 𝑔(𝑦 " )| % 𝑧 " − 𝑔 𝑦 " – Sum of absolute – Sum of squared differences differences (SSD) – Robust – Prone to outliers – Costly to compute – Compute-efficient (optimization) – Optimum is the median – Optimum is the mean I2DL: Prof. Niessner, Prof. Leal-Taixé 2

Bi Binar ary Cl Clas assificat ation on: Sigmoi moid 1 𝜏 𝒚, 𝜾 = 1 + 𝑓 !∑# ! $ ! 𝑦 % 1 𝜄 % 𝜏 𝑡 = 1 1 + 𝑓 !" Can be 𝑡 Σ 𝜄 # interpreted as 𝑦 # a probability 𝑞(𝑧 = 1|𝑦, 𝜾) 𝜄 $ 0 𝑦 $ I2DL: Prof. Niessner, Prof. Leal-Taixé 3

So Softm tmax x Formula lati tion • What if we have multiple classes? Scores Probabilities 𝑦 % for each class for each class 𝑓 𝒕𝟐 𝑡1 Σ 𝑞(𝑧 = 1|𝒚, 𝜾) = 𝑓 𝒕𝟐 + 𝑓 𝒕𝟑 + 𝑓 𝒕𝟒 𝑓 𝒕𝟑 𝑡2 Σ Softmax 𝑦 # 𝑞(𝑧 = 2|𝒚, 𝜾) = 𝑓 𝒕𝟐 + 𝑓 𝒕𝟑 + 𝑓 𝒕𝟒 𝑓 𝒕𝟒 𝑡3 Σ 𝑞(𝑧 = 3|𝒚, 𝜾) = 𝑓 𝒕𝟐 + 𝑓 𝒕𝟑 + 𝑓 𝒕𝟒 𝑦 $ I2DL: Prof. Niessner, Prof. Leal-Taixé 4

Ex Examp mple: e: Hin inge e vs Cr Cros oss-En Entrop opy Hinge Loss: 𝑀 @ = ∑ ABC % max(0, 𝑡 A − 𝑡 C % + 1) D &'% Cross Entropy : 𝑀 @ = − log( ∑ ( D &( ) Given the following scores for 𝒚 , : Hinge loss: Cross Entropy loss: max(0, −3 − 5 + 1) + 𝑡 = [5, −3, 2] * ! − ln * ! +* "% +* # = 0.05 Model 1 max 0, 2 − 5 + 1 = 0 max(0, 10 − 5 + 1) + * ! 𝑡 = [5, 10, 10] − ln * ! +* &$ +* &$ = 5.70 Model 2 max 0, 10 − 5 + 1 = 12 𝑡 = [5, −20, −20] * ! max(0, −20 − 5 + 1) + − ln Model 3 * ! +* "#$ +* "#$ max 0, −20 − 5 + 1 = 0 = 2 ∗ 10 !## 𝑧 , = 0 − Cross Entropy *always* wants to improve! (loss never 0) − Hinge Loss saturates. I2DL: Prof. Niessner, Prof. Leal-Taixé 5

Sigmoid Acti Si Activa vati tion 1 Forward 𝜏 𝑡 = 1 + 𝑓 !" 𝜖𝑥 = 𝜖𝑡 𝜖𝑀 𝜖𝑀 𝜖𝑥 𝜖𝑡 Saturated neurons kill the gradient flow 𝜖𝑀 𝜖𝑡 = 𝜖𝜏 𝜖𝑀 𝜖𝑀 𝜖𝜏 𝜖𝑡 𝜖𝜏 𝜖𝜏 𝜖𝑡 I2DL: Prof. Niessner, Prof. Leal-Taixé 6

Ta TanH nH Ac Acti tiva vati tion Still saturates Zero- centered [LeCun et al. 1991] Improving Generalization Performance in Character Recognition I2DL: Prof. Niessner, Prof. Leal-Taixé 7

Rec Rectified ed Linear ear Units (ReL ReLU) Dead ReLU Large and What happens if a consistent ReLU outputs zero? gradients Fast convergence Does not saturate [Krizhevsky et al. NeurIPS 2012] ImageNet Classification with Deep Convolutional Neural Networks I2DL: Prof. Niessner, Prof. Leal-Taixé 8

Qui Quick ck Gui Guide • Sigmoid is not really used. • ReLU is the standard choice. • Second choice are the variants of ReLU or Maxout. • Recurrent nets will require TanH or similar. I2DL: Prof. Niessner, Prof. Leal-Taixé 9

In Initialization is Extremely Im Important 𝑦 ∗ = arg min 𝑔(𝑦) • Optimum Initialization Not guaranteed to reach the optimum I2DL: Prof. Niessner, Prof. Leal-Taixé 10

Xavier In Initialization • How to ensure the variance of the output is the same as the input? 𝑜𝑊𝑏𝑠(𝑥 𝑊𝑏𝑠 𝑦 ) = 1 𝑊𝑏𝑠 𝑥 = 1 𝑜 I2DL: Prof. Niessner, Prof. Leal-Taixé 11

ReL ReLU Kills Hal alf of of the e Dat ata 𝑊𝑏𝑠 𝑥 = 2 𝑜 It makes a huge difference! [He et al., ICCV’15] He Initialization I2DL: Prof. Niessner, Prof. Leal-Taixé 12

Le Lecture 8 I2DL: Prof. Niessner, Prof. Leal-Taixé 13

Da Data ta Augm ugmen enta tati tion on I2DL: Prof. Niessner, Prof. Leal-Taixé 14

Da Data ta Augm gmenta tati tion • A classifier has to be invariant to a wide variety of transformations I2DL: Prof. Niessner, Prof. Leal-Taixé 16

Pose Appearance Illumination I2DL: Prof. Niessner, Prof. Leal-Taixé 17

Da Data ta Augm gmenta tati tion • A classifier has to be invariant to a wide variety of transformations • Helping the classifier: synthesize data simulating plausible transformations I2DL: Prof. Niessner, Prof. Leal-Taixé 18

Da Data ta Augm gmenta tati tion [Krizhevsky et al., NIPS’12] ImageNet I2DL: Prof. Niessner, Prof. Leal-Taixé 19

Da Data ta Augm gmenta tati tion: Br Brightnes ess • Random brightness and contrast changes [Krizhevsky et al., NIPS’12] ImageNet I2DL: Prof. Niessner, Prof. Leal-Taixé 20

Da Data ta Augm gmenta tati tion: Random Crops ps • Training: random crops – Pick a random L in [256,480] – Resize training image, short side L – Randomly sample crops of 224x224 • Testing: fixed set of crops – Resize image at N scales – 10 fixed crops of 224x224: (4 corners + 1 center ) × 2 flips [Krizhevsky et al., NIPS’12] ImageNet I2DL: Prof. Niessner, Prof. Leal-Taixé 21

Da Data ta Augm gmenta tati tion • When comparing two networks make sure to use the same data augmentation! • Consider data augmentation a part of your network design I2DL: Prof. Niessner, Prof. Leal-Taixé 22

Ad Advanc vanced Regula larization I2DL: Prof. Niessner, Prof. Leal-Taixé 23

Wei Weight Dec ecay ay • L2 regularization Θ &'$ = Θ & − 𝜗𝛼 ( Θ & , 𝑦, 𝑧 − 𝜇𝜄 & Learning rate Gradient Gradient of L2-regularization • Penalizes large weights Θ/2 Θ/2 Θ 0 • Improves generalization I2DL: Prof. Niessner, Prof. Leal-Taixé 24

Ea Early Stop oppin ing Overfitting I2DL: Prof. Niessner, Prof. Leal-Taixé 25

Ea Early Stop oppin ing • Easy form of regularization 𝜗 𝜗 … … Θ $ Θ ∗ Θ # Θ - Θ % Overfitting 𝜐 I2DL: Prof. Niessner, Prof. Leal-Taixé 26

Bag Bagging an and d En Ensemb emble e Met Methods ods • Train multiple models and average their results • E.g., use a different algorithm for optimization or change the objective function / loss function. • If errors are uncorrelated, the expected combined error will decrease linearly with the ensemble size I2DL: Prof. Niessner, Prof. Leal-Taixé 27

Bag Bagging an and d En Ensemb emble e Met Methods ods • Bagging: uses k different datasets Training Set 3 Training Set 2 Training Set 1 Image Source: [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 28

Dropout Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 29

Dr Dropo pout • Disable a random set of neurons (typically 50%) Forward [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 30

Dropout: In Intuition • Using half the network = half capacity Redundant representations Furry Has two eyes Has a tail Has paws Has two ears [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 31

Dropout: In Intuition • Using half the network = half capacity – Redundant representations – Base your scores on more features • Consider it as a model ensemble [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 32

Dropout: In Intuition • Two models in one Model 1 Model 2 [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 33

Dropout: In Intuition • Using half the network = half capacity – Redundant representations – Base your scores on more features • Consider it as two models in one – Training a large ensemble of models, each on different set of data (mini-batch) and with SHARED parameters Reducing co-adaptation between neurons [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 34

Dr Dropo pout: t: Te Test t Ti Time • All neurons are “turned on” – no dropout Conditions at train and test time are not the same [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 35

Dropo Dr pout: t: Te Test t Ti Time Dropout probability 𝑨 = (𝜄 Q 𝑦 Q + 𝜄 R 𝑦 R ) 5 𝑞 𝑞 = 0.5 • Test: 𝐹 𝑨 = 1 4 (𝜄 Q 0 + 𝜄 R 0 𝑨 • Train: 𝜄 # 𝜄 $ + 𝜄 Q 𝑦 Q + 𝜄 R 0 + 𝜄 Q 0 + 𝜄 R 𝑦 R 𝑦 $ 𝑦 # + 𝜄 Q 𝑦 Q + 𝜄 R 𝑦 R ) = 1 2 (𝜄 Q 𝑦 Q + 𝜄 R 𝑦 R ) Weight scaling inference rule [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 36

Dr Dropo pout: t: Verdict ct • Efficient bagging method with parameter sharing • Try it! • Dropout reduces the effective capacity of a model à larger models, more training time [Srivastava et al., JMLR’14] Dropout I2DL: Prof. Niessner, Prof. Leal-Taixé 37

Batch Normali lization I2DL: Prof. Niessner, Prof. Leal-Taixé 38

Our Our Go Goal • All we want is that our activations do not die out I2DL: Prof. Niessner, Prof. Leal-Taixé 39

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix - PowerPoint PPT Presentation

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 Na Nave L Losse sses: s: L L2 v vs L s L1 L2 Loss: L1 Loss: $ = "#$ % ! = "#$ ! | " ( " )| %

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2015 - Lecture 13: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2016 - Lecture 14: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2018

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2017

Semiotics: Recap Examples References Jrg Cassens Data and Process Visualization SoSe 2017

Access Methods 1 / 44 Recap Recap 2 / 44 Recap A More Detailed Architecture granularity:

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Trees (Part 1) 1 / 57 Trees (Part 1) Recap Recap 2 / 57 Trees (Part 1) Recap Hash Tables

Proof of Stake Recap Bitcoin Incentives Block subsidy Transaction fees Recap

Ruby Monstas Session 14 Agenda Recap Standard Library: RSS Exercises Recap Recap: TodoList

61A Lecture 11 Friday, September 21 Midterm 1 Recap 2 Midterm 1 Recap The exam was more

Interactive Proofs Lecture 16 What the all-powerful can convince mere mortals of 1 Recap 2

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Welcome! Todays Agenda: Now What TOTAL RECAP The Process / Digest Grand Recap

tss

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren Chalmers University of Technology

Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July

Deep Learning: Training Juhan Nam Training Deep Neural Networks Forward (hidden unit

Lectu ture 7 Recap Prof. Leal-Taix and Prof. Niessner 1 Bey Beyon ond l linea ear

Neural Networks. Petr Pok Czech Technical University in Prague Faculty of Electrical

Statistical challenges and opportunities for reliable CNS interfaces Liam Paninski Department of

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix - PowerPoint PPT Presentation

Le Lecture 7 7 R Recap ap I2DL: Prof. Niessner, Prof. Leal-Taix 1 Na Nave L Losse sses: s: L L2 v vs L s L1 L2 Loss: L1 Loss: $ = "#$ % ! = "#$ ! | " ( " )| %

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Probabilistic Computation Lecture 13 Understanding BPP 1 Recap 2 Recap Probabilistic

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2015 - Lecture 13: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker - April-July 2016 - Lecture 14: Grand

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker &amp; Debabrata Panja - April-July 2018

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker &amp; Debabrata Panja - April-July 2017

Semiotics: Recap Examples References Jrg Cassens Data and Process Visualization SoSe 2017

Access Methods 1 / 44 Recap Recap 2 / 44 Recap A More Detailed Architecture granularity:

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Trees (Part 1) 1 / 57 Trees (Part 1) Recap Recap 2 / 57 Trees (Part 1) Recap Hash Tables

Proof of Stake Recap Bitcoin Incentives Block subsidy Transaction fees Recap

Ruby Monstas Session 14 Agenda Recap Standard Library: RSS Exercises Recap Recap: TodoList

61A Lecture 11 Friday, September 21 Midterm 1 Recap 2 Midterm 1 Recap The exam was more

Interactive Proofs Lecture 16 What the all-powerful can convince mere mortals of 1 Recap 2

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Welcome! Todays Agenda: Now What TOTAL RECAP The Process / Digest Grand Recap

tss

DEEP LEARNING FFR135, Artificial Neural Networks Olof Mogren Chalmers University of Technology

Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July

Deep Learning: Training Juhan Nam Training Deep Neural Networks Forward (hidden unit

Lectu ture 7 Recap Prof. Leal-Taix and Prof. Niessner 1 Bey Beyon ond l linea ear

Neural Networks. Petr Pok Czech Technical University in Prague Faculty of Electrical

Statistical challenges and opportunities for reliable CNS interfaces Liam Paninski Department of

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2018

TOTAL RECAP INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2017

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing