deep learning for classification
play

Deep Learning for Classification CS293S, Yang, 2017 Computational - PowerPoint PPT Presentation

Deep Learning for Classification CS293S, Yang, 2017 Computational graph for classification w 1 f 1 w 2 S f 2 >0? w 3 f 3 Objective: Classification Accuracy m l acc ( w ) = 1 sign( w > f ( x ( i ) )) == y ( i ) X m i =1


  1. Deep Learning for Classification CS293S, Yang, 2017

  2. Computational graph for classification w 1 f 1 w 2 S f 2 >0? w 3 f 3 • Objective: Classification Accuracy m l acc ( w ) = 1 ⇣ sign( w > f ( x ( i ) )) == y ( i ) ⌘ X m i =1 – Issue: How to find these parameteres? Slide 1

  3. Neural Net with Soft-Max • Score for y=1: Score for y=-1: − w > f ( x ) w > f ( x ) e w > f ( x ( i ) ) • Probability of label: p ( y = 1 | f ( x ); w ) = e w > f ( x ) + e − w > f ( x ) e − w > f ( x ) p ( y = − 1 | f ( x ); w ) = e w > f ( x ) + e − w > f ( x ) m Y p ( y = y ( i ) | f ( x ( i ) ); w ) l ( w ) = • Objective: i =1 m X log p ( y = y ( i ) | f ( x ( i ) ); w ) • Log: ll ( w ) = i =1 Slide 2

  4. Two-Layer Neural Network w 1 S w 2 1 >0? w 3 1 w 1 1 f 1 w 1 S w 2 w 2 S 2 f 2 >0? w 3 2 2 w 3 f 3 w 1 S w 2 3 >0? w 3 3 z → tanh( z ) = e z − e − z 3 e z + e − z Slide 3

  5. N-Layer Neural Network S S S >0? >0? >0? … f 1 S S S S f 2 >0? >0? >0? … f 3 S S S … >0? >0? >0? Slide 4

  6. Convolutional Network (AlexNet) input image weights loss 5 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 5

  7. Activation Functions Leaky ReLU max(0.1x, x) Sigmoid Maxout tanh tanh(x) ELU ReLU max(0,x) 6 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 6

  8. Multi-class Softmax • 3-class softmax – classes A, B, C – 3 weight vectors: w A , w B , w C • Probability of label A: (similar for B, C) e w > A f ( x ) p ( y = A | f ( x ); w ) = A f ( x ) + e w > B f ( x ) + e w > e w > C f ( x ) m • Objective: Y p ( y = y ( i ) | f ( x ( i ) ; w ) l ( w ) = i =1 m X log p ( y = y ( i ) | f ( x ( i ) ; w ) ll ( w ) = • Log: i =1 Slide 7

  9. Multi-class Two-Layer Neural Network w 1 A Score for A w 1 S S w 2 1 >0? A w 2 w 3 1 A w 3 1 f 1 B w 1 w 1 Score for B B w 2 S S w 2 2 f 2 >0? B w 3 w 3 2 2 f 3 C w 1 Score for C w 1 S C w 2 S w 2 3 >0? C w 3 w 3 3 z → tanh( z ) = e z − e − z 3 e z + e − z Slide 8

  10. Gradient Descent Method for Optimization • How to find parameters that minimize an objective function? • Idea: – Start somewhere – Repeat: Take a step in the steepest descent direction Figure source: Mathworks Slide 9

  11. Generally, Steepest Direction • Steepest Direction = direction of the gradient   ∂ g ∂ w 1 ∂ g   ∂ w 2 r g =   § Gradient Descent   · · ·   ∂ g ∂ w n • Init: w • For i = 1, 2, … w w � α ⇤ r g ( w ) Slide 10

  12. What is the Steepest Descent Direction? min 2 ≤ ✏ g ( w + ∆ ) ∆ : ∆ 2 1 + ∆ 2 g ( w + ∆ ) ≈ g ( w ) + ∂ g ∆ 1 + ∂ g • First-Order Taylor Expansion: ∆ 2 ∂ w 1 ∂ w 2 ∂ g ∆ 1 + ∂ g ∆ 2 min • Steepest Descent Direction: ∂ w 1 ∂ w 2 ∆ : ∆ 2 1 + ∆ 2 2 ≤ ✏ a = � b ✏ a : k a k ✏ a > b min • Recall: à k b k ✏ �r g • Hence, solution: " # ∂ g kr g k ∂ w 1 r g = ∂ g ∂ w 2 Slide 11

  13. How to Calculate a Partial Deriviate in a Computational Graph Given a function f(x,y,z)= (x+y)z, What is the partial derivie of f with respect to x, y, z? 12 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 12

  14. e.g. x = -2, y = 5, z = -4 x, y, z values are from a training example 13 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 13

  15. e.g. x = -2, y = 5, z = -4 Want: 14 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 14

  16. e.g. x = -2, y = 5, z = -4 Want: 15 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 15

  17. e.g. x = -2, y = 5, z = -4 Want: 16 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 16

  18. e.g. x = -2, y = 5, z = -4 Want: 17 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 17

  19. e.g. x = -2, y = 5, z = -4 Want: 18 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 18

  20. e.g. x = -2, y = 5, z = -4 Want: 19 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 19

  21. e.g. x = -2, y = 5, z = -4 Want: 20 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 20

  22. e.g. x = -2, y = 5, z = -4 Want: 21 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 21

  23. e.g. x = -2, y = 5, z = -4 Chain rule: Want: 22 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 22

  24. e.g. x = -2, y = 5, z = -4 Want: 23 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 23

  25. e.g. x = -2, y = 5, z = -4 Chain rule: Want: 24 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 24

  26. activations f 25 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 25

  27. activations “local gradient” f 26 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 26

  28. activations “local gradient” f gradients 27 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 27

  29. activations “local gradient” f gradients 28 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 28

  30. activations “local gradient” f gradients 29 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 29

  31. activations “local gradient” f gradients 30 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 30

  32. Another example: 31 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 31

  33. Another example: 32 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 32

  34. Another example: 33 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 33

  35. Another example: 34 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 34

  36. Another example: 35 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 35

  37. Another example: 36 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 36

  38. Another example: 37 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 37

  39. Another example: 38 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 38

  40. Another example: 39 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 39

  41. Another example: (-1) * (-0.20) = 0.20 40 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 40

  42. Another example: 41 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 41

  43. Another example: [local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!) 42 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 42

  44. Another example: 43 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 43

  45. Another example: [local gradient] x [its gradient] x0: [2] x [0.2] = 0.4 w0: [-1] x [0.2] = -0.2 44 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 44

  46. sigmoid function sigmoid gate 45 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 45

  47. sigmoid function sigmoid gate (0.73) * (1 - 0.73) = 0.2 46 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 46

  48. Gradients add at branches + 47 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 47

  49. Summary • Deep learning – New direction for test processing given its success in image/audio processing – Framworks and software • TensorFllow (Google). • Others: Theano, Torch, CAFFE, computation graph toolkit (CGT) Slide 48

Recommend


More recommend