neural networks
play

Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer - PowerPoint PPT Presentation

Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison Convolutional neural networks Strong empirical application performance Convolutional networks: neural networks that


  1. Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison

  2. Convolutional neural networks • Strong empirical application performance • Convolutional networks: neural networks that use convolution in place of general matrix multiplication in at least one of their layers ℎ = 𝜏(𝑋 𝑈 𝑦 + 𝑐) for a specific kind of weight matrix 𝑋

  3. Convolution

  4. Convolution: discrete version • Given array 𝑣 𝑢 and 𝑥 𝑢 , their convolution is a function 𝑡 𝑢 +∞ 𝑡 𝑢 = ෍ 𝑣 𝑏 𝑥 𝑢−𝑏 𝑏=−∞ • Written as 𝑡 = 𝑣 ∗ 𝑥 or 𝑡 𝑢 = 𝑣 ∗ 𝑥 𝑢 • When 𝑣 𝑢 or 𝑥 𝑢 is not defined, assumed to be 0

  5. Illustration 1 𝑥 = [z, y, x] 𝑡 3 𝑣 = [a, b, c, d, e, f] xb+yc+zd 𝐱 𝟑 𝐱 𝟐 𝐱 𝟏 x y z 𝐯 𝟐 𝒗 𝟑 𝐯 𝟒 a b c d e f

  6. Illustration 1 𝑡 4 xc+yd+ze 𝐱 𝟑 𝐱 𝟐 𝐱 𝟏 x y z 𝐯 𝟑 𝒗 𝟒 𝐯 𝟓 a b c d e f

  7. Illustration 1 𝑡 5 xd+ye+zf 𝐱 𝟑 𝐱 𝟐 𝐱 𝟏 x y z 𝐯 𝟒 𝒗 𝟓 𝐯 𝟔 a b c d e f

  8. Illustration 1: boundary case 𝑡 6 xe+yf 𝐱 𝟑 𝐱 𝟐 x y 𝒗 𝟓 𝐯 𝟔 a b c d e f

  9. Illustration 1 as matrix multiplication y z a x y z b x y z c x y z d x y z e x y f

  10. Illustration 2: two dimensional case a b c d w x e f g h y z i j k l wa + bx + ey + fz

  11. Illustration 2 a b c d w x e f g h y z i j k l wa + bx + bw + cx + ey + fz fy + gz

  12. Illustration 2 Input Kernel (or filter) a b c d w x e f g h y z i j k l wa + bx + bw + cx + ey + fz fy + gz Feature map

  13. Advantage: sparse interaction Fully connected layer, 𝑛 × 𝑜 edges 𝑛 output nodes 𝑜 input nodes Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  14. Advantage: sparse interaction Convolutional layer, ≤ 𝑛 × 𝑙 edges 𝑛 output nodes 𝑙 kernel size 𝑜 input nodes Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  15. Advantage: sparse interaction Multiple convolutional layers: larger receptive field Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  16. Advantage: parameter sharing The same kernel are used repeatedly. E.g., the black edge is the same weight in the kernel. Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  17. Advantage: equivariant representations • Equivariant: transforming the input = transforming the output • Example: input is an image, transformation is shifting • Convolution(shift(input)) = shift(Convolution(input)) • Useful when care only about the existence of a pattern, rather than the location

  18. Pooling • Summarizing the input (i.e., output the max of the input) Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  19. Advantage Induce invariance Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  20. Motivation from neuroscience • David Hubel and Torsten Wiesel studied early visual system in human brain (V1 or primary visual cortex), and won Nobel prize for this • V1 properties • 2D spatial arrangement • Simple cells: inspire convolution layers • Complex cells: inspire pooling layers

  21. Variants of convolution and pooling

  22. Variants of convolutional layers • Multiple dimensional convolution • Input and kernel can be 3D • E.g., images have (width, height, RBG channels) • Multiple kernels lead to multiple feature maps (also called channels) • Mini-batch of images have 4D: (image_id, width, height, RBG channels)

  23. Variants of convolutional layers • Padding: valid xd+ye+zf x y z a b c d e f

  24. Variants of convolutional layers • Padding: same xe+yf x y a b c d e f

  25. Variants of convolutional layers • Stride Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  26. Variants of pooling • Stride and padding Figure from Deep Learning, by Goodfellow, Bengio, and Courville

  27. Variants of pooling • Max pooling 𝑧 = max{𝑦 1 , 𝑦 2 , … , 𝑦 𝑙 } • Average pooling 𝑧 = mean{𝑦 1 , 𝑦 2 , … , 𝑦 𝑙 } • Others like max-out

  28. Case study: LeNet-5

  29. LeNet-5 • Proposed in “ Gradient-based learning applied to document recognition ” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

  30. LeNet-5 • Proposed in “ Gradient-based learning applied to document recognition ” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998 • Apply convolution on 2D images (MNIST) and use backpropagation

  31. LeNet-5 • Proposed in “ Gradient-based learning applied to document recognition ” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998 • Apply convolution on 2D images (MNIST) and use backpropagation • Structure: 2 convolutional layers (with pooling) + 3 fully connected layers • Input size: 32x32x1 • Convolution kernel size: 5x5 • Pooling: 2x2

  32. LeNet-5 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  33. LeNet-5 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  34. LeNet-5 Filter: 5x5, stride: 1x1, #filters: 6 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  35. LeNet-5 Pooling: 2x2, stride: 2 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  36. LeNet-5 Filter: 5x5x6, stride: 1x1, #filters: 16 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  37. LeNet-5 Pooling: 2x2, stride: 2 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  38. LeNet-5 Weight matrix: 400x120 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

  39. Weight matrix: 84x10 LeNet-5 Weight matrix: 120x84 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Recommend


More recommend