all you want to know about cnns
play

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep - PowerPoint PPT Presentation

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep


  1. All You Want To Know About CNNs Yukun Zhu

  2. Deep Learning

  3. Deep Learning Image from http://imgur.com/

  4. Deep Learning Image from http://imgur.com/

  5. Deep Learning Image from http://imgur.com/

  6. Deep Learning Image from http://imgur.com/

  7. Deep Learning Image from http://imgur.com/

  8. Deep Learning Image from http://imgur.com/

  9. Deep Learning in Vision Object detection performance, PASCAL VOC 2010 33.4 DPM (2010)

  10. Deep Learning in Vision Object detection performance, PASCAL VOC 2010 40.4 33.4 DPM segDPM (2010) (2014)

  11. Deep Learning in Vision Object detection performance, PASCAL VOC 2010 53.7 40.4 33.4 DPM segDPM RCNN (2010) (2014) (2014)

  12. Deep Learning in Vision Object detection performance, PASCAL VOC 2010 62.9 53.7 40.4 33.4 DPM segDPM RCNN RCNN* (2010) (2014) (2014) (Oct 2014)

  13. Deep Learning in Vision Object detection performance, PASCAL VOC 2010 67.2 62.9 53.7 40.4 33.4 DPM segDPM RCNN RCNN* segRCNN (2010) (2014) (2014) (Oct 2014) (Jan 2015)

  14. Deep Learning in Vision Object detection performance, PASCAL VOC 2010 70.8 67.2 62.9 53.7 40.4 33.4 DPM segDPM RCNN RCNN* segRCNN Fast RCNN (2010) (2014) (2014) (Oct 2014) (Jan 2015) (Jun 2015)

  15. A Neuron Image from http://cs231n.github.io/neural-networks-1/

  16. A Neuron in Neural Network Image from http://cs231n.github.io/neural-networks-1/

  17. Activation Functions ● Sigmoid: f(x) = 1 / (1 + e -x ) ● ReLU: f(x) = max(0, x) ● Leaky ReLU: f(x) = max(ax, x) ● Maxout: f(x) = max(w 0 x + b 0 , w 1 x + b 1 ) ● and many others…

  18. Neural Network (MLP) The network simulates a function y = f(x; w) Image modified from http://cs231n.github.io/neural-networks-1/

  19. x 0 Forward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -1.00 -3.00 -2.00 -3.00 Image and code modified from http://cs231n.github.io/optimization-2/

  20. x 0 Forward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 -3.00 -2.00 -3.00 Image and code modified from http://cs231n.github.io/optimization-2/

  21. x 0 Forward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 -3.00 6.00 -2.00 -3.00 Image and code modified from http://cs231n.github.io/optimization-2/

  22. x 0 Forward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 -2.00 -3.00 Image and code modified from http://cs231n.github.io/optimization-2/

  23. x 0 Forward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 1.00 -2.00 -3.00 Image and code modified from http://cs231n.github.io/optimization-2/

  24. x 0 Forward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 1.00 -1.00 -2.00 -3.00 Image and code modified from http://cs231n.github.io/optimization-2/

  25. x 0 Forward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 1.00 -1.00 0.37 -2.00 -3.00 Image and code modified from http://cs231n.github.io/optimization-2/

  26. x 0 Forward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 1.00 -1.00 0.37 1.37 -2.00 -3.00 Image and code modified from http://cs231n.github.io/optimization-2/

  27. x 0 Forward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 1.00 -1.00 0.37 1.37 0.73 -2.00 -3.00 Image and code modified from http://cs231n.github.io/optimization-2/

  28. Loss Function Loss function measures how well prediction matches true value Commonly used loss function: ● Squared loss: (y - y’) 2 ● Cross-entropy loss: -sum i (y i ’ * log(y i )) ● and many others

  29. Loss Function During training, we would like to minimize the total loss on a set of training data ● We want to find w* = argmin{sum i [loss(f(x i ; w), y i )]}

  30. Loss Function During training, we would like to minimize the total loss on a set of training data ● We want to find w* = argmin{sum i [loss(f(x i ; w), y i )]} ● Usually we use gradient based approach w t+1 = w t - a ∇ w ○

  31. x 0 Backward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 1.00 -1.00 0.37 1.37 0.73 -2.00 1.00 -3.00 Image and code modified from http://cs231n.github.io/optimization-2/

  32. x 0 Backward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 1.00 -1.00 0.37 1.37 0.73 -2.00 -0.53 1.00 f = 1/x -3.00 df/dx = -1/x 2 Image and code modified from http://cs231n.github.io/optimization-2/

  33. x 0 Backward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 1.00 -1.00 0.37 1.37 0.73 -2.00 -0.53 -0.53 1.00 f = x + 1 -3.00 df/dx = 1 Image and code modified from http://cs231n.github.io/optimization-2/

  34. x 0 Backward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 1.00 -1.00 0.37 1.37 0.73 -2.00 -0.20 -0.53 -0.53 1.00 f = e x -3.00 df/dx = e x Image and code modified from http://cs231n.github.io/optimization-2/

  35. x 0 Backward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 -3.00 6.00 1.00 -1.00 0.37 1.37 0.73 -2.00 0.20 -0.20 -0.53 -0.53 1.00 f = -x -3.00 df/dx = -1 Image and code modified from http://cs231n.github.io/optimization-2/

  36. x 0 Backward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 4.00 0.20 -3.00 6.00 1.00 -1.00 0.37 1.37 0.73 -2.00 0.20 -0.20 -0.53 -0.53 1.00 f = x + a df/dx = 1 -3.00 0.20 Image and code modified from http://cs231n.github.io/optimization-2/

  37. x 0 Backward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -2.00 -1.00 0.20 4.00 0.20 -3.00 6.00 1.00 -1.00 0.37 1.37 0.73 -2.00 0.20 0.20 -0.20 -0.53 -0.53 1.00 -3.00 0.20 Image and code modified from http://cs231n.github.io/optimization-2/

  38. x 0 Backward Computation x 1 sigmoid f(x 0 , x 1 ) = 1 / (1 + exp(-(w 0 x 0 + w 1 x 1 + w 2 ))) 1 2.00 -0.20 -2.00 -1.00 0.20 0.40 4.00 f = ax df/dx = a 0.20 -3.00 -0.40 6.00 1.00 -1.00 0.37 1.37 0.73 -2.00 0.20 0.20 -0.20 -0.53 -0.53 1.00 -0.60 -3.00 0.20 Image and code modified from http://cs231n.github.io/optimization-2/

  39. Why NNs?

  40. Universal Approximation Theorem A feed-forward network with a single hidden layer containing a finite number of neurons, can approximate continuous functions on compact subsets of R n , under mild assumptions on the activation function. https://en.wikipedia.org/wiki/Universal_approximation_theorem

  41. Stone’s Theorem ● Suppose X is a compact Hausdorff space and B is a subalgebra in C(X, R) such that: B separates points. ○ B contains the constant function 1. ○ If f ∈ B then a f ∈ B for all constants a ∈ R. ○ If f , g ∈ B, then f + g , max{ f, g } ∈ B. ○ ● Then every continuous function defined on C(X, R) can be approximated as closely as desired by functions in B

  42. Why CNNs?

  43. Problems of MLP in Vision For input as a 10 * 10 image: ● A 3 layer MLP with 200 hidden units contains ~100k parameters For input as a 100 * 100 image: ● A 1 layer MLP with 20k hidden units contains ~200m parameters

  44. Can We Do Better?

  45. Can We Do Better?

  46. Can We Do Better?

  47. Can We Do Better?

  48. Can We Do Better? Based on such observation, MLP can be improved in two ways: ● Locally connected instead of fully connected ● Sharing weights between neurons We achieve those by using convolution neurons

  49. Convolutional Layers Image from http://cs231n.github.io/convolutional-networks/

  50. Convolutional Layers width height depth Image from http://cs231n.github.io/convolutional-networks/. See this page for an excellent example of convolution.

  51. Pooling Layers Image from http://cs231n.github.io/convolutional-networks/

  52. Pooling Layers Example: Max Pooling Image from http://cs231n.github.io/convolutional-networks/

  53. Pooling Layers Commonly used pooling layers: ● Max pooling ● Average pooling Why pooling layers? ● Reduce activation dimensionality ● Robust against tiny shifts

  54. CNN Architecture: An Example Image from http://cs231n.github.io/convolutional-networks/

  55. Layer Activations for CNNs Conv:1 ReLU:1 Conv:2 ReLU:2 MaxPool:1 Conv:3 Image modified from http://cs231n.github.io/convolutional-networks/

  56. Layer Activations for CNNs MaxPool:2 Conv:5 ReLU:5 Conv:6 ReLU:6 MaxPool:3 Image modified from http://cs231n.github.io/convolutional-networks/

Recommend


More recommend