lecture 13
play

Lecture 13: Introduction to Deep Learning Deep Convolutional Neural - PowerPoint PPT Presentation

Lecture 13: Introduction to Deep Learning Deep Convolutional Neural Networks Aykut Erdem November 2016 Hacettepe University Administrative Assignment 3 is out! It is due November 30, 2016 You will implement a 2-layer Neural


  1. It’s an old paradigm • The first learning machine: 
 Feature Extractor the Perceptron - Built at Cornell in 1960 A • The Perceptron was a linear classifier on top of a simple feature extractor W i • The vast majority of practical applications of ML today use glorified linear classifiers N y=sign ( W i F i ( X ) +b ) or glorified template matching. ∑ • Designing a feature extractor requires i= 1 considerable e ff orts by experts. slide by Marc’Aurelio Ranzato, Yann LeCun 33

  2. Hierarchical Compositionality VISION pixels edge texton motif part object SPEECH spectral sample formant motif phone word band slide by Marc’Aurelio Ranzato, Yann LeCun NLP character word NP/VP/.. clause sentence story 34

  3. Building A Complicated Function Given a library of simple functions Compose into a complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 35

  4. Building A Complicated Function Given a library of simple functions Idea 1: Linear Combinations • Boosting Compose into a • Kernels • … complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 36

  5. Building A Complicated Function Given a library of simple functions Idea 2: Compositions • Deep Learning Compose into a • Grammar models • Scattering transforms… complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 37

  6. Building A Complicated Function Given a library of simple functions Idea 2: Compositions • Deep Learning Compose into a • Grammar models • Scattering transforms… complicate function slide by Marc’Aurelio Ranzato, Yann LeCun 38

  7. Deep Learning = Hierarchical Compositionality “car” slide by Marc’Aurelio Ranzato, Yann LeCun 39

  8. Deep Learning = Hierarchical Compositionality “car” Low-Level 
 Mid-Level 
 High-Level 
 Trainable 
 Feature Feature Feature Classifier slide by Marc’Aurelio Ranzato, Yann LeCun Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013] 40

  9. slide by Dhruv Batra Sparse DBNs [Lee et al. ICML ‘09] Figure courtesy: Quoc Le 41

  10. Three key ideas • (Hierarchical) Compositionality - Cascade of non-linear transformations - Multiple layers of representations • End-to-End Learning - Learning (goal-driven) representations - Learning to feature extract • Distributed Representations - No single neuron “encodes” everything - Groups of neurons work together slide by Dhruv Batra 42

  11. Traditional Machine Learning VISION hand-crafted 
 your favorite 
 features “car” classifier SIFT/HOG fixed learned SPEECH hand-crafted 
 your favorite 
 features \ ˈ d ē p\ classifier MFCC fixed learned slide by Marc’Aurelio Ranzato, Yann LeCun NLP hand-crafted 
 your favorite 
 This burrito place features “+” classifier is yummy and fun! Bag-of-words fixed learned 43

  12. Traditional Machine Learning (more accurately) “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ ˈ d ē p\ Gaussians slide by Marc’Aurelio Ranzato, Yann LeCun fixed unsupervised supervised NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised 44

  13. Deep Learning = End-to-End Learning “Learned” VISION K-Means/ SIFT/HOG classifier “car” pooling fixed unsupervised supervised SPEECH Mixture of MFCC classifier \ ˈ d ē p\ Gaussians fixed unsupervised supervised slide by Marc’Aurelio Ranzato, Yann LeCun NLP Parse Tree This burrito place n-grams classifier “+” Syntactic is yummy and fun! fixed unsupervised supervised 45

  14. Deep Learning = End-to-End Learning • A hierarchy of trainable feature transforms - Each module transforms its input representation into a higher-level one. - High-level features are more global and more invariant - Low-level features are shared among categories Trainable 
 Trainable 
 Trainable 
 slide by Marc’Aurelio Ranzato, Yann LeCun Feature- Feature- Feature- Transform / 
 Transform / 
 Transform / 
 Classifier Classifier Classifier Learned Internal Representations 46

  15. “Shallow” vs Deep Learning • “Shallow” models hand-crafted “Simple” Trainable Feature Extractor Classifier fixed learned • Deep models Trainable 
 Trainable 
 Trainable 
 Feature- Feature- Feature- slide by Marc’Aurelio Ranzato, Yann LeCun Transform / 
 Transform / 
 Transform / 
 Classifier Classifier Classifier Learned Internal Representations 47

  16. Three key ideas • (Hierarchical) Compositionality - Cascade of non-linear transformations - Multiple layers of representations • End-to-End Learning - Learning (goal-driven) representations - Learning to feature extract • Distributed Representations - No single neuron “encodes” everything - Groups of neurons work together slide by Dhruv Batra 48

  17. Localist representations • The simplest way to represent things with neural networks is to dedicate one neuron to each thing . - Easy to understand. - Easy to code by hand • Often used to represent inputs to a net - Easy to learn • This is what mixture models do. • Each cluster corresponds to one neuron - Easy to associate with other representations or responses. • But localist models are very inefficient whenever the data has componential slide by Geoff Hinton structure. Image credit: Moontae Lee 49

  18. Distributed Representations • Each neuron must represent something, so this must be a local representation. • Distributed representation means a many-to-many relationship between two types of representation (such as concepts and neurons). - Each concept is represented by many neurons - Each neuron participates in the representation of many concepts slide by Geoff Hinton Local Distributed Image credit: Moontae Lee 50

  19. Power of distributed representations! Scene Classification bedroom mountain • Possible internal representations: - Objects - Scene attributes - Object parts - Textures slide by Bolei Zhou B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba “Object Detectors Emerge in Deep Scene CNNs” , ICLR 2015 51

  20. Deep Convolutional 
 Neural Networks 52

  21. 53 Convolutions slide by Yisong Yue

  22. Convolution Filters slide by Yisong Yue 54

  23. 55 Gabor Filters slide by Yisong Yue

  24. Gaussian Blur Filters slide by Yisong Yue 56

  25. Convolutional Neural Networks 57 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson

  26. 58 Convolution Layer height 32x32x3 image width 32 depth 32 3 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson

  27. Convolution Layer 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson spatially, computing dot 32 products” 3 59

  28. Convolution Layer Filters always extend the full depth of the input volume 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson spatially, computing dot 32 products” 3 60

  29. Convolution Layer 32x32x3 image 5x5x3 filter 32 1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 32 (i.e. 5*5*3 = 75-dimensional dot product + bias) 3 61

  30. Convolution Layer activation 32x32x3 image map 5x5x3 filter 32 28 convolve (slide) over all spatial locations 28 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 32 3 1 62

  31. Convolution Layer consider a second, green filter activation maps 32x32x3 image 5x5x3 filter 32 28 convolve (slide) over all spatial locations 28 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 32 3 1 63

  32. For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer 28 32 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 3 6 We stack these up to get a “new image” of size 28x28x6! 64

  33. Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 32 28 CONV, ReLU e.g. 6 5x5x3 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 32 28 filters 3 6 65

  34. Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 32 28 24 …. CONV, CONV, CONV, ReLU ReLU ReLU e.g. 6 e.g. 10 5x5x 6 5x5x3 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 32 28 24 filters filters 3 6 10 66

  35. [From recent Yann 67 LeCun slides] Preview slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson

  36. [From recent Yann 68 LeCun slides] Preview slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson

  37. one filter => one activation map example 5x5 filters (32 total) We call the layer convolutional because it is related to convolution of two signals: slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson elementwise multiplication and sum of a filter and the signal (image) 69

  38. 70 70 Preview slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson

  39. A closer look at spatial dimensions: activation 32x32x3 image map 5x5x3 filter 32 28 convolve (slide) over all spatial locations slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 28 32 3 1 71

  40. A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter 7 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 72

  41. A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter 7 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 73

  42. A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter 7 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 74

  43. A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter 7 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 75

  44. A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter => 5x5 output 7 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 76

  45. A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter applied with stride 2 7 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 77

  46. A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter applied with stride 2 7 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 78

  47. A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter applied with stride 2 => 3x3 output! 7 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 79

  48. A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter applied with stride 3? 7 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 80

  49. A closer look at spatial dimensions: 7 7x7 input (spatially) assume 3x3 filter applied with stride 3? 7 doesn’t fit! cannot apply 3x3 filter on slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 7x7 input with stride 3. 81

  50. N Output size: (N - F) / stride + 1 F e.g. N = 7, F = 3: stride 1 => (7 - 3)/1 + 1 = 5 N F stride 2 => (7 - 3)/2 + 1 = 3 stride 3 => (7 - 3)/3 + 1 = 2.33 : \ slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 82

  51. In practice: Common to zero pad 
 the border e.g. input 7x7 0 0 0 0 0 0 3x3 filter, applied with stride 1 0 pad with 1 pixel border => what is the output? 0 0 0 (recall:) slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson (N - F) / stride + 1 83

  52. In practice: Common to zero pad 
 the border e.g. input 7x7 0 0 0 0 0 0 3x3 filter, applied with stride 1 0 pad with 1 pixel border => what is the output? 0 0 7x7 output! 0 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 84

  53. In practice: Common to zero pad 
 the border e.g. input 7x7 0 0 0 0 0 0 3x3 filter, applied with stride 1 0 pad with 1 pixel border => what is the output? 0 0 7x7 output! in general, common to see CONV layers 0 with stride 1, filters of size FxF , and zero- padding with (F-1)/2. (will preserve size slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson spatially) e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3 85

  54. Remember back to… E.g. 32x32 input convolved repeatedly with 5x5 filters 
 shrinks volumes spatially! (32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well. 32 28 24 …. CONV, CONV, CONV, slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson ReLU ReLU ReLU e.g. 6 e.g. 10 5x5x 6 5x5x3 32 28 24 filters filters 3 6 10 86

  55. Recap: Convolution Layer (No padding, no strides) Convolving a 3 × 3 kernel over a 4 × 4 input using unit strides 
 (i.e., i = 4, k = 3, s = 1 and p = 0). Image credit: Vincent Dumoulin and Francesco Visin 87

  56. Computing the output values of a 2D discrete convolution 
 i 1 = i 2 = 5, k 1 = k 2 = 3, s 1 = s 2 = 2, and p 1 = p 2 = 1 Image credit: Vincent Dumoulin and Francesco Visin 88

  57. Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Output volume size: ? slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 89

  58. Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Output volume size: (32+2*2-5)/1+1 = 32 spatially, so slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 32x32x10 90

  59. Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Number of parameters in this layer? slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 91

  60. Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Number of parameters in this layer? each filter has 5*5*3 + 1 = 76 params (+1 for bias) slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson => 76*10 = 760 92

  61. 93 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson

  62. Common settings: K = (powers of 2, e.g. 32, 64, 128, 512) - F = 3, S = 1, P = 1 - F = 5, S = 1, P = 2 - F = 5, S = 2, P = ? (whatever fits) - F = 1, S = 1, P = 0 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 94

  63. (btw, 1x1 convolution layers make perfect sense) 1x1 CONV 56 with 32 filters 56 (each filter has size 1x1x64, and performs a slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 64-dimensional dot product) 56 56 64 32 95

  64. 96 Example: CONV layer in Torch slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson

  65. 97 Example: CONV layer in Caffe slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson

  66. 98 Example: CONV layer in Lasagne slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson

  67. The brain/neuron view of CONV Layer 32x32x3 image 5x5x3 filter 32 slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 1 number: 32 the result of taking a dot product between the filter and this part of the image 3 (i.e. 5*5*3 = 75-dimensional dot product) 99

  68. The brain/neuron view of CONV Layer 32x32x3 image 5x5x3 filter 32 It’s just a neuron with local connectivity... slide by Fei-Fei Li, Andrej Karpathy & Justin Johnson 1 number: 32 the result of taking a dot product between the filter and this part of the image 3 (i.e. 5*5*3 = 75-dimensional dot product) 100

Recommend


More recommend