convolutional neural nets ii
play

Convolutional Neural Nets II EECS 442 Prof. David Fouhey Winter - PowerPoint PPT Presentation

Convolutional Neural Nets II EECS 442 Prof. David Fouhey Winter 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/ Previously Backpropagation = + 3 2 x -x -x+3 (-x+3) 2 -n n 2 n+3


  1. Convolutional Neural Nets II EECS 442 – Prof. David Fouhey Winter 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/

  2. Previously – Backpropagation 𝑔 𝑦 = βˆ’π‘¦ + 3 2 x -x -x+3 (-x+3) 2 -n n 2 n+3 1 2x βˆ’ 6 βˆ’2𝑦 + 6 βˆ’2𝑦 + 6 Forward pass: compute function Backward pass: compute derivative of all parts of the function

  3. Setting Up A Neural Net Input Hidden Output h 1 y 1 x 1 h 2 y 2 x 2 h 3 y 3 h 4

  4. Setting Up A Neural Net Input Hidden 2 Output Hidden 1 a 1 h 1 y 1 x 1 a 2 h 2 y 2 x 2 a 3 h 3 y 3 a 4 h 4

  5. Fully Connected Network a 1 h 1 y 1 Each neuron connects x 1 a 2 h 2 to each neuron in the y 2 previous layer x 2 a 3 h 3 y 3 a 4 h 4

  6. Fully Connected Network Define New Block: β€œLinear Layer” (Ok technically it’s Affine) W b 𝑀 𝒐 = 𝑿𝒐 + 𝒄 n L Can get gradient with respect to all the inputs (do on your own; useful trick: have to be able to do matrix multiply)

  7. Fully Connected Network a 1 h 1 y 1 x 1 a 2 h 2 y 2 x 2 a 3 h 3 y 3 a 4 h 4 W 1 b 1 W 2 b 2 W 3 b 3 x L f (n) L f (n) L f (n)

  8. Convolutional Layer New Block: 2D Convoluiton W b 𝐷 𝒐 = 𝒐 βˆ— 𝑿 + 𝒄 n C

  9. Convolution Layer F w 32 F h c 𝐺 β„Ž 𝐺 𝑑 π‘₯ 𝑐 + ෍ ෍ ෍ 𝐺 𝑗,π‘˜,𝑙 βˆ— 𝐽 𝑧+𝑗,𝑦+π‘˜,𝑑 32 𝑗=1 π‘˜=1 𝑙=1 3 Slide credit: Karpathy and Fei-Fei

  10. Convolutional Neural Network (CNN) W 1 b 1 W 2 b 2 W 3 b 3 x C f (n) C f (n) C f (n)

  11. Today C W F 1 CNN 1 H Convert HxW image into a F-dimensional vector β€’ What’s the probability this image is a cat (F=1) β€’ Which of 1000 categories is this image? (F=1000) β€’ At what GPS coord was this image taken? (F=2) β€’ Identify the X,Y coordinates of 28 body joints of an image of a human (F=56)

  12. Today’s Running Example: Classification C W F 1 CNN 1 H Running example: image classification P(image is class #1) P(image is class #2) P(image is class #F)

  13. Today’s Running Example: Classification C W 1 CNN 1 0.5 0.2 0.1 0.2 H y i : class #0 Loss function β€œHippo” exp( 𝑋𝑦 𝑧 𝑗 βˆ’ log Οƒ 𝑙 exp( 𝑋𝑦 𝑙 ))

  14. Today’s Running Example: Classification C W 1 CNN 1 0.5 0.2 0.1 0.2 H y i : class #3 Loss function β€œBaboon” exp( 𝑋𝑦 𝑧 𝑗 βˆ’ log Οƒ 𝑙 exp( 𝑋𝑦 𝑙 ))

  15. Model For Your Head C W F 1 CNN 1 H β€’ Provide: β€’ Examples of images and desired outputs β€’ Sequence of layers producing a 1x1xF output β€’ A loss function that measures success β€’ Train the network -> network figures out the parameters that makes this work

  16. Layer Collection You can construct functions out of layers. The only requirement is the layers β€œfit” together. Optimization figures out what the parameters of the layers are. Image credit: lego.com

  17. Review – Pooling Idea: just want spatial resolution of activations / images smaller; applied per-channel Max-pool 1 1 2 4 2x2 Filter 5 6 7 8 6 8 Stride 2 3 2 1 0 3 4 1 1 3 4 Slide credit: Karpathy and Fei-Fei

  18. Review – Pooling Max-pool 2x2 Filter Stride 2 1 1 2 4 6 8 5 6 7 8 3 2 1 0 3 4 1 1 3 4

  19. Other Layers – Fully Connected 1x1xC 1x1xF Map C-dimensional feature to F-dimensional feature using linear transformation W (FxC matrix) + b (Fx1 vector) How can we write this as a convolution?

  20. Everything’s a Convolution 1x1xC 1x1xF Set Fh=1, Fw=1 1x1 Convolution with F Filters 𝐺 β„Ž 𝐺 𝑑 𝑑 π‘₯ 𝑐 + ෍ ෍ ෍ 𝐺 𝑗,π‘˜,𝑙 βˆ— 𝐽 𝑧+𝑗,𝑦+π‘˜,𝑑 𝑐 + ෍ 𝐺 𝑙 βˆ— 𝐽 𝑑 𝑗=1 π‘˜=1 𝑙=1 𝑙=1

  21. Converting to a Vector HxWxC 1x1xF How can we do this?

  22. Converting to a Vector* – Pool HxWxC 1x1xF Avg Pool 1 1 2 4 HxW Filter 5 6 7 8 Stride 1 3.1 3 2 1 0 1 1 3 4 *(If F == C)

  23. Converting to a Vector – Convolve HxWxC 1x1xF HxW Convolution with F Filters Single value βˆ— Per-filter

  24. Looking At Networks β€’ We’ll look at 3 landmark networks, each trained to solve a 1000-way classification output (Imagenet) β€’ Alexnet (2012) β€’ VGG-16 (2014) β€’ Resnet (2015)

  25. AlexNet Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 Each block is a HxWxC volume. You transform one volume to another with convolution

  26. CNN Terminology Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 Each entry is called an β€œactivation”/β€œneuron”/β€œfeature”

  27. AlexNet Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384

  28. AlexNet Input Conv 1 227x227 55x55 55x55 227x227 55x55 3 96 96 3 96 ReLU 11x11 filter, stride of 4 (227-11)/4+1 = 55

  29. AlexNet Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 All layers followed by ReLU Red layers are followed by maxpool Early layers have β€œnormalization”

  30. AlexNet – Details Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 C: 11 C:5 C:3 C:3 C:3 P: 3 P:3 P:3 C: Size of conv P: Size of pool

  31. AlexNet Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 13x13 Input, 1x1 output. How?

  32. Alexnet – How Many Parameters? Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384

  33. Alexnet – How Many Parameters? Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 96 11x11 filters on 3 -channel input 11x11 x 3 x 96+96 = 34,944

  34. Alexnet – How Many Parameters? Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 Note: max pool to 6x6 4096 6x6 filters on 256 -channel input 6x6 x 256 x 4096+4096 = 38 million

  35. Alexnet – How Many Parameters? Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 4096 1x1 filters on 4096 -channel input 1x1 x 4096 x 4096+4096 = 17 million

  36. Alexnet – How Many Parameters How long would it take you to list the parameters of Alexnet at 4s / parameter? 1 year? 4 years? 8 years? 16 years? β€’ 62.4 million parameters β€’ Vast majority in fully connected layers β€’ But... paper notes that removing the convolutions is disastrous for performance.

  37. Dataset – ILSVRC β€’ Imagenet Largescale Visual Recognition Challenge β€’ 1000 Categories β€’ 1.4M images

  38. Dataset – ILSVRC Figure Credit: O. Russakovsky

  39. Visualizing Filters Input Conv 1 227x227 55x55 3 96 Conv 1 Filters β€’ Q. How many input dimensions? β€’ A: 3 β€’ What does the input mean? β€’ R, G, B, duh.

  40. What’s Learned First layer filters of a network trained to distinguish 1000 categories of objects Remember these filters go over color. Figure Credit: Karpathy and Fei-Fei

  41. Visualizing Later Filters Input Conv Conv 1 2 227x227 55x55 27x27 3 96 256 Conv 2 Filters β€’ Q. How many input dimensions? β€’ A: 96…. hmmm β€’ What does the input mean? β€’ Uh, the uh, previous slide

  42. Visualizing Later Filters β€’ Understanding the meaning of the later filters from their values is typically impossible: too many input dimensions, not even clear what the input means.

  43. Understanding Later Filters Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 CNN that extracts a 2-hidden layer 13x13x256 output Neural network

  44. Understanding Later Filters Input Conv Conv Conv Conv Conv FC FC Output 1 2 3 4 5 6 7 1x1 227x227 55x55 27x27 13x13 13x13 13x13 1x1 1x1 256 4096 4096 1000 3 96 256 384 384 CNN that extracts a 1-hidden 1x1x4096 feature layer NN

  45. Understanding Later Filters Input Conv Conv Conv Conv Conv 1 2 3 4 5 227x227 55x55 27x27 13x13 13x13 13x13 256 3 96 256 384 384 CNN that extracts a 13x13x256 output

  46. Understanding Later Filters Feed an image in, see what score the filter gives it. A more pleasant version of a real neuroscience procedure. 13x13 256 Which one’s bigger? What image makes the output biggest? 13x13 256

  47. Figure Credit: Girschick et al. CVPR 2014.

  48. What’s Up With the White Boxes? 3 384 13 227 227 13

Recommend


More recommend