csc 411 lecture 11 neural networks ii
play

CSC 411: Lecture 11: Neural Networks II Class based on Raquel - PowerPoint PPT Presentation

CSC 411: Lecture 11: Neural Networks II Class based on Raquel Urtasun & Rich Zemels lectures Sanja Fidler University of Toronto March 2, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 1 / 55 Today


  1. CSC 411: Lecture 11: Neural Networks II Class based on Raquel Urtasun & Rich Zemel’s lectures Sanja Fidler University of Toronto March 2, 2016 Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 1 / 55

  2. Today Deep learning for Object Recognition Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 2 / 55

  3. Neural Nets for Object Recognition People are very good at recognizing shapes Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 3 / 55

  4. Neural Nets for Object Recognition People are very good at recognizing shapes I Intrinsically di ffi cult, computers are bad at it Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 3 / 55

  5. Neural Nets for Object Recognition People are very good at recognizing shapes I Intrinsically di ffi cult, computers are bad at it Why is it di ffi cult? Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 3 / 55

  6. Why is it a Problem? Di ffi cult scene conditions [From: Grauman & Leibe] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 4 / 55

  7. Why is it a Problem? Huge within-class variations. Recognition is mainly about modeling variation. [Pic from: S. Lazebnik] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 5 / 55

  8. Why is it a Problem? Tones of classes [Biederman] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 6 / 55

  9. Neural Nets for Object Recognition People are very good at recognizing shapes I Intrinsically di ffi cult, computers are bad at it Some reasons why it is di ffi cult: Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 7 / 55

  10. Neural Nets for Object Recognition People are very good at recognizing shapes I Intrinsically di ffi cult, computers are bad at it Some reasons why it is di ffi cult: I Segmentation: Real scenes are cluttered Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 7 / 55

  11. Neural Nets for Object Recognition People are very good at recognizing shapes I Intrinsically di ffi cult, computers are bad at it Some reasons why it is di ffi cult: I Segmentation: Real scenes are cluttered I Invariances: We are very good at ignoring all sorts of variations that do not a ff ect shape Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 7 / 55

  12. Neural Nets for Object Recognition People are very good at recognizing shapes I Intrinsically di ffi cult, computers are bad at it Some reasons why it is di ffi cult: I Segmentation: Real scenes are cluttered I Invariances: We are very good at ignoring all sorts of variations that do not a ff ect shape I Deformations: Natural shape classes allow variations (faces, letters, chairs) Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 7 / 55

  13. Neural Nets for Object Recognition People are very good at recognizing shapes I Intrinsically di ffi cult, computers are bad at it Some reasons why it is di ffi cult: I Segmentation: Real scenes are cluttered I Invariances: We are very good at ignoring all sorts of variations that do not a ff ect shape I Deformations: Natural shape classes allow variations (faces, letters, chairs) I A huge amount of computation is required Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 7 / 55

  14. How to Deal with Large Input Spaces How can we apply neural nets to images? Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

  15. How to Deal with Large Input Spaces How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

  16. How to Deal with Large Input Spaces How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have? Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

  17. How to Deal with Large Input Spaces How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have? Prohibitive to have fully-connected layers Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

  18. How to Deal with Large Input Spaces How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have? Prohibitive to have fully-connected layers What can we do? Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

  19. How to Deal with Large Input Spaces How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have? Prohibitive to have fully-connected layers What can we do? We can use a locally connected layer Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 8 / 55

  20. Locally Connected Layer Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., 34 face recognition). Ra Ranzato Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 9 / 55

  21. When Will this Work? When Will this Work? Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 10 / 55

  22. When Will this Work? When Will this Work? This is good when the input is (roughly) registered Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 10 / 55

  23. General Images The object can be anywhere [Slide: Y. Zhu] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 11 / 55

  24. General Images The object can be anywhere [Slide: Y. Zhu] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 12 / 55

  25. General Images The object can be anywhere [Slide: Y. Zhu] Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 13 / 55

  26. Locally Connected Layer STATIONARITY? Statistics is similar at different locations Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., 35 face recognition). Ranzato Ra Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 14 / 55

  27. The replicated feature approach Adopt approach apparently used in monkey visual systems The red connections all Use many di ff erent copies of the same have the same weight. feature detector. I Copies have slightly di ff erent positions. I Could also replicate across scale and orientation. I Tricky and expensive I Replication reduces number of free parameters to be learned. Use several di ff erent feature types, each with its own replicated pool of detectors. 5 I Allows each patch of image to be represented in several ways. Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 15 / 55

  28. Convolutional Neural Net Idea: statistics are similar at di ff erent locations (Lecun 1998) Connect each hidden unit to a small input patch and share the weight across space This is called a convolution layer and the network is a convolutional network Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 16 / 55

  29. Convolutional Layer Ra Ranzato K h n X h n − 1 ∗ w n j = max(0 , jk ) k k =1 Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 17 / 55

  30. Convolutional Layer Ra Ranzato K X h n − 1 h n ∗ w n j = max(0 , jk ) k k =1 Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 18 / 55

  31. Convolutional Layer Ra Ranzato K X h n − 1 h n ∗ w n j = max(0 , jk ) k k =1 Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 19 / 55

  32. Convolutional Layer Ra Ranzato K X h n − 1 h n ∗ w n j = max(0 , jk ) k k =1 Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 20 / 55

  33. Convolutional Layer Ra Ranzato K h n X h n − 1 ∗ w n j = max(0 , jk ) k k =1 Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 21 / 55

  34. Convolutional Layer Ra Ranzato K h n X h n − 1 ∗ w n j = max(0 , jk ) k k =1 Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 22 / 55

  35. Convolutional Layer Learn multiple filters. E.g.: 200x200 image 100 Filters Filter size: 10x10 10K parameters 54 Ra Ranzato Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 23 / 55

  36. Convolutional Layer Figure: Left: CNN, right: Each neuron computes a linear and activation function Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 24 / 55

  37. Convolutional Layer Figure: Left: CNN, right: Each neuron computes a linear and activation function Hyperparameters of a convolutional layer: Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 24 / 55

  38. Convolutional Layer Figure: Left: CNN, right: Each neuron computes a linear and activation function Hyperparameters of a convolutional layer: The number of filters (controls the depth of the output volume) Urtasun, Zemel, Fidler (UofT) CSC 411: 11-Neural Networks II March 2, 2016 24 / 55

Recommend


More recommend