learning based vision ii
play

Learning Based Vision II Computer Vision Fall 2018 Columbia - PowerPoint PPT Presentation

Learning Based Vision II Computer Vision Fall 2018 Columbia University Project Project Proposals due October 31 Pick one of our suggested projects, or pitch your own Must use something in this course Groups of 2 strongly


  1. Learning Based Vision II Computer Vision Fall 2018 Columbia University

  2. Project • Project Proposals due October 31 • Pick one of our suggested projects, or pitch your own • Must use something in this course • Groups of 2 strongly recommended • If you want help finding a team, see post on Piazza • We’ll give you Google Cloud credits once you turn in your project proposal • Details here: http://w4731.cs.columbia.edu/project

  3. Neural Networks

  4. Convolutional Network (AlexNet) input conv1 conv2 conv3 conv4 conv5 “fc6” “fc7” output 1x1x4096 1x1x4096 1x1x1000 13x13x256 13x13x384 13x13x384 27x27x256 55x55x96 Red layers are followed by max pooling 224x224x3 Visualization hids the dimensions of the filters Slide credit: Deva Ramanan

  5. Convolutional Layer = x i ∈ ℝ W × H × D * i ∈ ℝ w × h × D w k x i +1 ∈ ℝ W × H × K

  6. Learning x i θ Parameters Input (image) y i f ( x i ; θ ) Prediction Target (labels) ℒ Loss Function ℒ ( f ( x i ; θ ), y i ) + λ ∥ θ ∥ 2 θ ∑ min 2 i ℒ ( z , y ) = − ∑ y i log z i j

  7. Slide from Rob Fergus, NYU

  8. Let’s break them

  9. “school bus”

  10. “school bus” “ostrich”

  11. + = (scaled for “ostrich” “school bus” visualization)

  12. Images on left are correctly classified Images on the right are incorrectly classified as ostrich

  13. How can we find these? Solve optimization problem to find minimal change that maximizes the loss Δ ℒ ( f ( x + Δ ), y ) − λ ∥Δ∥ 2 max 2

  14. 99% confidence! Nguyen, Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

  15. 99% confidence! Also 99% confidence! Nguyen, Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

  16. Nguyen, Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

  17. Universal attacks Moosave-Dezfooli et al. arXiv 1610.08401

  18. Universal attacks Attack is agnostic to the image content Moosave-Dezfooli et al. arXiv 1610.08401

  19. Change just one pixel Su et al, “One pixel attack for fooling deep neural networks”

  20. In the physical world

  21. In the 3D physical world

  22. Neural network camouflage https://cvdazzle.com/

  23. Which Pixels in the Input Affect the Neuron the Most? • Rephrased: which pixels would make the neuron not turn on if they had been different? • In other words, for which inputs is 𝜖𝑜𝑓𝑣𝑠𝑝𝑜 𝜖𝑦 𝑗 large?

  24. Typical Gradient of a Neuron • Visualize the gradient of a particular neuron with respect to the input x • Do a forward pass: • Compute the gradient of a particular neuron using backprop:

  25. “Guided Backpropagation” • Idea: neurons act like detectors of particular image features • We are only interested in what image features the neuron detects, not in what kind of stuff it doesn’t detect • So when propagating the gradient, we set all the negative gradients to 0 • We don’t care if a pixel “suppresses” a neuron somewhere along the part to our neuron

  26. Guided Backpropagation Compute gradient, Compute gradient, Compute gradient, zero out negatives, zero out negatives, zero out negatives, backpropagate backpropagate backpropagate

  27. Guided Backpropagation Backprop Guided Backprop

  28. Guided Backpropagation Springerberg et al, Striving for Simplicity: The All Convolutional Net (ICLR 2015 workshops)

  29. What About Doing Gradient Descent? • What to maximize the i-th output of the softmax • Can compute the gradient of the i-th output of the softmax with respect to the input x (the W’s and b’s are fixed to make classification as good as possible) • Perform gradient descent on the input

  30. Yosinski et al, Understanding Neural Networks Through Deep Visualization (ICML 2015)

  31. ConvNet P(category) Image

  32. ConvNet P(category) Image What if we learn to generate adversarial examples?

  33. Noise ConvNet ConvNet P(category) What if we learn to generate adversarial examples?

  34. Generative Adversarial Networks Goodfellow et al G Noise D P(real)

  35. Generated images Trained with CIFAR-10

  36. Introduced a form of ConvNet more stable under adversarial training than previous attempts.

  37. Generator Random uniform vector (100 numbers)

  38. Synthesized images

  39. Transposed-convolution

  40. Transposed-convolution Convolution Transposed-convolution

  41. Generated Images Brock et al. Large scale GAN training for high fidelity natural image synthesis

  42. Image Interpolation

  43. Image Interpolation

  44. Nearest Neighbors

  45. Nearest Neighbors

  46. Generating Dynamics

  47. Two components Generator Network to visualize car fc7 Classification fc6 layer conv5 conv4 conv3 conv2 conv1

  48. Two components Generator Network to visualize Table lamp fc7 Classification fc6 layer conv5 conv4 conv3 conv2 conv1

  49. Two components Table lamp Classification fc6 fc7 layer conv5 conv4 conv3 conv2 Generator conv1 Unit to visualize

  50. Synthesizing Images Preferred by CNN ImageNet-Alexnet-final units (class units) Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J. (2016). "Synthesizing the preferred inputs for neurons in neural networks via deep generator networks.". arXiv:1605.09304.

  51. Where to start training?

  52. Gradient Descent How to pick where to start? α δ ℒ ℒ δθ θ

  53. Idea 0: Train many models

  54. Drop-out regularization (a) Standard Neural Net (b) After applying dropout. Intuition: we should really train a family of models with different architectures and average their predictions (c.f. model averaging from machine learning) Practical implementation: learn a single “superset” architecture that randomly removes nodes (by randomly zero’ing out activations) during gradient updates Slide credit: Deva Ramanan

  55. Idea 1: Carefully pick starting point

  56. Backprop x L � 1 x 2 x 3 x L ... x 0 f 1 f 2 f L ` z 2 R w 1 w 2 w L dz d = [ ` y � f L ( · ; w L ) � ... � f 2 ( · ; w 2 ) � f 1 ( x 0 ; w 1 )] d w l d w l dz dz d (vec x L � 1 ) > . . . d vec x l +1 d vec x L d vec x l = d w > d w l d (vec x L ) > d (vec x l ) > l Slide credit: Deva Ramanan

  57. Idea 1: Carefully pick starting point He et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

  58. Exploding and vanishing gradient • How does the determinant of the gradients e ff ect the final gradient? • What if the determinant is less than one? • What if the determinant is greater than one? dz dz d (vec x L � 1 ) > . . . d vec x l +1 d vec x L d vec x l = d w > d w l d (vec x L ) > d (vec x l ) > l

  59. Exploding and vanishing gradient Source: Roger Grosse

  60. Initialization • Key idea: initialization weights so that the variance of activations is one at each layer • You can derive what this should be for di ff erent layers and nonlinearities w i ∼ 𝒪 ( 0,2 k ) • For ReLU: b i = 0 He et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

  61. Idea 2: How to maintain this throughout training?

  62. Batch Normalization ! = ! − % ! ' = (" ! + * " & • % : mean of ! in mini-batch • % , & : functions of ! , • & : std of ! in mini-batch analogous to responses • ( : scale • ( , * : parameters to be learned, • * : shift analogous to weights Ioffe & Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. ICML 2015

  63. Batch Normalization # = # − ! # ' = ($ # + * $ " 2 modes of BN: • Train mode: Caution : make sure your • ! , " are functions of a batch of # BN usage is correct! • Test mode: (this causes many of my bugs in • ! , " are pre-computed on training set my research experience!) Ioffe & Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. ICML 2015

  64. Batch Normalization w/ BN w/o BN accuracy iter. Figure credit: Ioffe & Szegedy Ioffe & Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. ICML 2015

  65. Back to breaking things…

  66. Architecture of Krizhevsky et al. Softmax Output • 8 layers total Layer 7: Full Layer 6: Full • Trained on Imagenet dataset [Deng et al. CVPR’09] Layer 5: Conv + Pool Layer 4: Conv • 18.2% top-5 error Layer 3: Conv Layer 2: Conv + Pool • Our reimplementation: 18.1% top-5 error Layer 1: Conv + Pool Input Image

  67. Architecture of Krizhevsky et al. Softmax Output • Remove top fully connected layer – Layer 7 Layer 6: Full Layer 5: Conv + Pool • Drop 16 million Layer 4: Conv parameters Layer 3: Conv • Only 1.1% drop in Layer 2: Conv + Pool performance! Layer 1: Conv + Pool Input Image

Recommend


More recommend