lectu ture 6 6 reca recap
play

Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 - PowerPoint PPT Presentation

Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 Ne Neural Ne Netw twork Width Depth Prof. Leal-Taix and Prof. Niessner 2 Gr Gradi dient De Descent fo for Neural Netwo works ks " =: ! " ' " (


  1. Lectu ture 6 6 reca recap Prof. Leal-Taixé and Prof. Niessner 1

  2. Ne Neural Ne Netw twork Width Depth Prof. Leal-Taixé and Prof. Niessner 2

  3. Gr Gradi dient De Descent fo for Neural Netwo works ks ℎ " =: ! " ' " ( " ℎ # =2 ","," … ! # … ' # ℎ $ ( # =: 7 8,9 : ;,< (2) = ! $ =2 ?,@,A … ℎ & … =: 5 ) = ' ) − ( ) $ =- ?,@ ' ) = +(- #,) + 0 ℎ 1 2 #,),1 ) ℎ 1 = +(- ",1 + 0 ! 4 2 ",1,4 ) 1 4 Just simple: + ! = max(0, !) Prof. Leal-Taixé and Prof. Niessner 3

  4. St Stocha hastic Gradient nt De Descent nt (SG SGD) D) ! "#$ = ! " − '( ) *(! " , - {$..0} , 2 {$..0} ) 0 ( ) * 5 $ 0 ∑ 56$ ( ) * = 7 now refers to 7 -th iteration 8 training samples in the current batch Gradient for the 7 -th batch + all variations of SGD: momentum, RMSProp, Adam, … Prof. Leal-Taixé and Prof. Niessner 4

  5. Im Importan ance of Lear arnin ing Rat ate Prof. Leal-Taixé and Prof. Niessner 5

  6. Ove Over- an and d Unde derfit ittin ing Underfitted Appropriate Overfitted Figure extracted from Deep Learning by Adam Gibson, Josh Patterson, O‘Reily Media Inc., 2017 Prof. Leal-Taixé and Prof. Niessner 6

  7. Ove Over- an and d Unde derfit ittin ing Source: http://srdas.github.io/DLBook/ImprovingModelGeneralization.html Prof. Leal-Taixé and Prof. Niessner 7

  8. Ba Basic r rec ecipe f e for or m machine l e lea earning • Split your data 60% 20% 20% validation test train Find your hyperparameters Prof. Leal-Taixé and Prof. Niessner 8

  9. Ba Basic r rec ecipe f e for or m machine l e lea earning Prof. Leal-Taixé and Prof. Niessner 9

  10. Ba Basically… Prof. Leal-Taixé and Prof. Niessner 10 Deep learning memes

  11. Fu Fun things… s… Prof. Leal-Taixé and Prof. Niessner 11 Deep learning memes

  12. Fu Fun things… s… Prof. Leal-Taixé and Prof. Niessner 12 Deep learning memes

  13. Fu Fun things… s… Prof. Leal-Taixé and Prof. Niessner 13 Deep learning memes

  14. Going Deep into to Neural Netw tworks Prof. Leal-Taixé and Prof. Niessner 14

  15. Si Simpl mple St Star arting ng Point nts for Debuggi bugging ng • Start simple! – First, overfit to a single training sample – Second, overfit to several training samples • Always try simple architecture first – It will verify that you are learning something • Estimate timings (how long for each epoch?) Prof. Leal-Taixé and Prof. Niessner 15

  16. Ne Neural Ne Netw twork • Problems of going deeper… – Vanishing gradients (multiplication of chain rule) • The impact of small decisions (architecture, activation functions...) • Is my network training correctly? Prof. Leal-Taixé and Prof. Niessner 16

  17. Ne Neural Ne Netw tworks 2) 2) Functions in Neu eurons 3) Input t of da data ta 1) 1) Ou Outp tput t functio ctions Prof. Leal-Taixé and Prof. Niessner 17

  18. Outp tput t Functi tions Prof. Leal-Taixé and Prof. Niessner 18

  19. Ne Neural Ne Netw tworks What is the shape of this function? Loss (Softmax, Hinge) Prediction Prof. Leal-Taixé and Prof. Niessner 19

  20. Sigmoid for Bi Si Binary P Pred ediction ons 1 σ ( x ) = x 0 1 + e − x θ 0 1 Can be θ 1 interpreted as X x 1 a probability θ 2 0 x 2 p ( y i = 1 | x i , θ ) Prof. Leal-Taixé and Prof. Niessner 20

  21. So Softmax fo formu rmulation • What if we have multiple classes? x 0 θ 0 θ 1 X Π i x 1 θ 2 x 2 Prof. Leal-Taixé and Prof. Niessner 21

  22. So Softmax fo formu rmulation • What if we have multiple classes? x 0 X Softmax x 1 X x 2 Prof. Leal-Taixé and Prof. Niessner 22

  23. So Softmax fo formu rmulation • What if we have multiple classes? x 0 e x i θ 1 Π 1 = X e x i θ 1 + e x i θ 2 Softmax x 1 e x i θ 2 X Π 2 = e x i θ 1 + e x i θ 2 x 2 Prof. Leal-Taixé and Prof. Niessner 23

  24. So Softmax fo formu rmulation • Softmax exp e x θ i p ( y i | x , θ ) = n P e x θ k normalize k =1 • Softmax loss (Maximum Likelihood Estimate) e s yi ✓ ◆ L i = − log P k e s k Prof. Leal-Taixé and Prof. Niessner 24

  25. Loss Functi tions Prof. Leal-Taixé and Prof. Niessner 25

  26. Na Naïve ve Losses " L2 Loss: ! " = ∑ %&' ( ) % − + , % 12 24 42 23 15 20 40 25 - Sum of squared differences (SSD) 34 32 5 2 34 32 5 2 - Prune to outliers 12 31 12 31 12 31 12 31 - Compute-efficient (optimization) 31 64 5 13 31 64 5 13 - Optimum is the mean , % ) % L1 Loss: ! ' = ∑ %&' ( |) % − +(, % )| ! " ,, ) = 9 + 16 + 4 + 4 + 0 + ⋯ + 0 = 66 - Sum of absolute differences - Robust ! ' ,, ) = 3 + 4 + 2 + 2 + 0 + ⋯ + 0 = 15 - Costly to compute - Optimum is the median Prof. Leal-Taixé and Prof. Niessner 26

  27. Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : Suppose: 3 training examples and 3 classes 3. 3.2 1.3 2.2 cat scores 5.1 4. 4.9 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 27

  28. Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : 3.2 Suppose: 3 training examples and 3 classes 5.1 -1.7 3. 3.2 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 28

  29. Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : 3.2 24.5 Suppose: 3 training examples and 3 classes exp 5.1 164.0 -1.7 0.18 3.2 3. 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 29

  30. Cross-En Cros Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : normalize 3.2 24.5 0.13 Suppose: 3 training examples and 3 classes exp 5.1 164.0 0.87 -1.7 0.18 0.00 3. 3.2 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Loss Prof. Leal-Taixé and Prof. Niessner 30

  31. Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : normalize 3.2 24.5 0.13 2.04 -log(x) Suppose: 3 training examples and 3 classes exp 5.1 164.0 0.87 0.14 -1.7 0.18 0.00 6.94 3.2 3. 1.3 2.2 cat scores 5.1 4. 4.9 2.5 chair -1.7 2.0 -3. 3.1 “car” 2.0 .04 0.14 0. 6.94 6. Loss Prof. Leal-Taixé and Prof. Niessner 31

  32. Cros Cross-En Entrop opy ( (So Softmax) ! " = − log( ) *+, Softmax ∑ , ) *. ) Given a function with weights 4 , Training pairs [2 " ; = " ] (input and labels) 0 = 1(2 " , 4) Score function e.g., 1(2 " , 4) = 4 ⋅ 2 6 , 2 7 , … , 2 9 : normalize 3.2 24.5 0.13 2.0 .04 -log(x) Suppose: 3 training examples and 3 classes exp 5.1 164.0 0.87 0.14 -1.7 0.18 0.00 6.94 9 ! = 1 @ A ! " = "B7 3.2 3. 1.3 2.2 = ! 7 + ! D + ! E cat scores = 5.1 4. 4.9 2.5 chair 3 -1.7 2.0 -3. 3.1 “car” = 2.04 + 0.079 + 6.156 = 2.0 .04 0.07 0. 079 6.156 6. Loss 3 = O. PQ Prof. Leal-Taixé and Prof. Niessner 32

  33. Hi Hinge Loss ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Prof. Leal-Taixé and Prof. Niessner 33

  34. Hi Hinge Loss ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Given a function with weights 6 , Training pairs [5 " ; ? " ] (input and labels) / = 4(5 " , 6) Score function e.g., 4(5 " , 6) = 6 ⋅ 5 8 , 5 9 , … , 5 ; < Prof. Leal-Taixé and Prof. Niessner 34

  35. Hi Hinge Loss ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Given a function with weights 6 , Training pairs [5 " ; ? " ] (input and labels) / = 4(5 " , 6) Score function e.g., 4(5 " , 6) = 6 ⋅ 5 8 , 5 9 , … , 5 ; < Suppose: 3 training examples and 3 classes Prof. Leal-Taixé and Prof. Niessner 35

  36. Hinge Loss Hi ss (SVM Loss) ss) Multiclass SVM loss ! " = ∑ %&' ( max(0, / % − / ' ( + 1) Given a function with weights 6 , Training pairs [5 " ; ? " ] (input and labels) / = 4(5 " , 6) Score function e.g., 4(5 " , 6) = 6 ⋅ 5 8 , 5 9 , … , 5 ; < Suppose: 3 training examples and 3 classes 3. 3.2 1.3 2.2 cat scores 5.1 4.9 4. 2.5 chair -1.7 2.0 -3. 3.1 “car” Prof. Leal-Taixé and Prof. Niessner 36

Recommend


More recommend