convolutional networks ii
play

Convolutional Networks II Bhiksha Raj 1 Story so far Pattern - PowerPoint PPT Presentation

Deep Neural Networks Convolutional Networks II Bhiksha Raj 1 Story so far Pattern classification tasks such as does this picture contain a cat, or does this recording include HELLO are best performed by scanning for the


  1. Supervising the neocognitron Output class label(s) • Add an extra decision layer after the final C layer – Produces a class-label output • We now have a fully feed forward MLP with shared parameters – All the S-cells within an S-plane have the same weights • Simple backpropagation can now train the S-cell weights in every plane of every layer – C-cells are not updated

  2. Scanning vs. multiple filters • Note : The original Neocognitron actually uses many identical copies of a neuron in each S and C plane

  3. Supervising the neocognitron Output class label(s) • The Math – Assuming square receptive fields, rather than elliptical ones – Receptive field of S cells in lth layer is 𝐿 𝑚 × 𝐿 𝑚 – Receptive field of C cells in lth layer is 𝑀 𝑚 × 𝑀 𝑚

  4. Supervising the neocognitron Output class label(s) 𝑳 𝒎 𝑳 𝒎 𝑽 𝑻,𝒎,𝒐 𝒋, 𝒌 = 𝝉 ෍ ෍ ෍ 𝒙 𝑻,𝒎,𝒐 (𝑞, 𝑙, 𝑚)𝑽 𝑫,𝒎−𝟐,𝒒 (𝑗 + 𝑚, 𝑘 + 𝑙) 𝒒 𝑙=1 𝑚=1 𝑽 𝑫,𝒎,𝒐 𝒋, 𝒌 = 𝑙∈ 𝑗,𝑗+𝑀 𝑚 ,𝑘∈(𝑚,𝑚+𝑀 𝑚 ) 𝑽 𝑻,𝒎,𝒐 𝒋, 𝒌 max • This is, however, identical to “scanning” (convolving) with a single neuron/filter (what LeNet actually did)

  5. Convolutional Neural Networks

  6. The general architecture of a convolutional neural network Output Multi-layer Perceptron • A convolutional neural network comprises of “convolutional” and “down - sampling” layers – The two may occur in any sequence, but typically they alternate • Followed by an MLP with one or more layers

  7. The general architecture of a convolutional neural network Output Multi-layer Perceptron • A convolutional neural network comprises of “convolutional” and “ downsampling ” layers – The two may occur in any sequence, but typically they alternate • Followed by an MLP with one or more layers

  8. The general architecture of a convolutional neural network Output Multi-layer Perceptron • Convolutional layers and the MLP are learnable – Their parameters must be learned from training data for the target classification task • Down-sampling layers are fixed and generally not learnable

  9. A convolutional layer Maps Previous layer • A convolutional layer comprises of a series of “maps” – Corresponding the “S - planes” in the Neocognitron – Variously called feature maps or activation maps

  10. A convolutional layer Previous Previous layer layer • Each activation map has two components – A linear map, obtained by convolution over maps in the previous layer • Each linear map has, associated with it, a learnable filter – An activation that operates on the output of the convolution

  11. A convolutional layer Previous Previous layer layer • All the maps in the previous layer contribute to each convolution

  12. A convolutional layer Previous Previous layer layer • All the maps in the previous layer contribute to each convolution – Consider the contribution of a single map

  13. What is a convolution Example 5x5 image with binary pixels Example 3x3 filter bias 1 1 1 0 0 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 1 1 0 1 0 0 1 1 0 3 3 𝑨 𝑗, 𝑘 = ෍ ෍ 𝑔 𝑙, 𝑚 𝐽 𝑗 + 𝑚, 𝑘 + 𝑙 + 𝑐 0 1 1 0 0 𝑙=1 𝑚=1 • Scanning an image with a “filter” – Note: a filter is really just a perceptron, with weights and a bias

  14. What is a convolution 1 0 1 0 0 1 0 bias 1 0 1 Filter Input Map • Scanning an image with a “filter” – At each location, the “filter and the underlying map values are multiplied component wise, and the products are added along with the bias

  15. The “Stride” between adjacent scanned locations need not be 1 1 0 1 0 1 1 1 0 0 x1 x0 x1 0 1 0 bias 4 0 1 1 1 0 1 0 1 x0 x1 x0 Filter 0 0 1 1 1 x1 x0 x1 0 0 1 1 0 0 1 1 0 0 • Scanning an image with a “filter” – The filter may proceed by more than 1 pixel at a time – E.g. with a “stride” of two pixels per shift

  16. The “Stride” between adjacent scanned locations need not be 1 1 0 1 0 1 1 1 0 0 x1 x0 x1 0 1 0 bias 4 4 0 1 1 1 0 1 0 1 x0 x1 x0 Filter 0 0 1 1 1 x1 x0 x1 0 0 1 1 0 0 1 1 0 0 • Scanning an image with a “filter” – The filter may proceed by more than 1 pixel at a time – E.g. with a “hop” of two pixels per shift

  17. The “Stride” between adjacent scanned locations need not be 1 1 0 1 0 1 1 1 0 0 0 1 0 bias 4 4 0 1 1 1 0 1 0 1 Filter 2 0 0 1 1 1 x1 x0 x1 0 0 1 1 0 x0 x1 x0 0 1 1 0 0 x1 x0 x1 • Scanning an image with a “filter” – The filter may proceed by more than 1 pixel at a time – E.g. with a “hop” of two pixels per shift

  18. The “Stride” between adjacent scanned locations need not be 1 1 0 1 0 1 1 1 0 0 0 1 0 bias 4 4 0 1 1 1 0 1 0 1 Filter 4 2 0 0 1 1 1 x1 x0 x1 0 0 1 1 0 x0 x1 x0 0 1 1 0 0 x1 x0 x1 • Scanning an image with a “filter” – The filter may proceed by more than 1 pixel at a time – E.g. with a “hop” of two pixels per shift

  19. Extending to multiple input maps Previous Previous layer layer • We actually compute any individual convolutional map from all the maps in the previous layer

  20. Extending to multiple input maps Previous layer • We actually compute any individual convolutional map from all the maps in the previous layer • The actual processing is better understood if we modify our visualization of all the maps in a layer as vertical arrangement to..

  21. Extending to multiple input maps Stacked arrangement of kth layer of maps Filter applied to kth layer of maps (convolutive component plus bias) • ..A stacked arrangement of planes • We can view the joint processing of the various maps as processing the stack using a three- dimensional filter

  22. Extending to multiple input maps bias 𝑀 𝑀 𝑨 𝑗, 𝑘 = ෍ ෍ ෍ 𝑥 𝑇,𝑚,𝑜 (𝑞, 𝑙, 𝑚)𝑍 𝑞 (𝑗 + 𝑚, 𝑘 + 𝑙) + 𝑐 𝑞 𝑙=1 𝑚=1 • The computation of the convolutive map at any location sums the convolutive outputs at all planes

  23. Extending to multiple input maps bias One map 𝑀 𝑀 𝑨 𝑗, 𝑘 = ෍ ෍ ෍ 𝑥 𝑇,𝑚,𝑜 (𝑞, 𝑙, 𝑚)𝑍 𝑞 (𝑗 + 𝑚, 𝑘 + 𝑙) + 𝑐 𝑞 𝑙=1 𝑚=1 • The computation of the convolutive map at any location sums the convolutive outputs at all planes

  24. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, 𝑘 = ෍ ෍ ෍ 𝑥 𝑇,𝑚,𝑜 (𝑞, 𝑙, 𝑚)𝑍 𝑞 (𝑗 + 𝑚, 𝑘 + 𝑙) + 𝑐 𝑞 𝑙=1 𝑚=1 • The computation of the convolutive map at any location sums the convolutive outputs at all planes

  25. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, 𝑘 = ෍ ෍ ෍ 𝑥 𝑇,𝑚,𝑜 (𝑞, 𝑙, 𝑚)𝑍 𝑞 (𝑗 + 𝑚, 𝑘 + 𝑙) + 𝑐 𝑞 𝑙=1 𝑚=1 • The computation of the convolutive map at any location sums the convolutive outputs at all planes

  26. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, 𝑘 = ෍ ෍ ෍ 𝑥 𝑇,𝑚,𝑜 (𝑞, 𝑙, 𝑚)𝑍 𝑞 (𝑗 + 𝑚, 𝑘 + 𝑙) + 𝑐 𝑞 𝑙=1 𝑚=1 • The computation of the convolutive map at any location sums the convolutive outputs at all planes

  27. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, 𝑘 = ෍ ෍ ෍ 𝑥 𝑇,𝑚,𝑜 (𝑞, 𝑙, 𝑚)𝑍 𝑞 (𝑗 + 𝑚, 𝑘 + 𝑙) + 𝑐 𝑞 𝑙=1 𝑚=1 • The computation of the convolutive map at any location sums the convolutive outputs at all planes

  28. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, 𝑘 = ෍ ෍ ෍ 𝑥 𝑇,𝑚,𝑜 (𝑞, 𝑙, 𝑚)𝑍 𝑞 (𝑗 + 𝑚, 𝑘 + 𝑙) + 𝑐 𝑞 𝑙=1 𝑚=1 • The computation of the convolutive map at any location sums the convolutive outputs at all planes

  29. Extending to multiple input maps bias All maps 𝑀 𝑀 𝑨 𝑗, 𝑘 = ෍ ෍ ෍ 𝑥 𝑇,𝑚,𝑜 (𝑞, 𝑙, 𝑚)𝑍 𝑞 (𝑗 + 𝑚, 𝑘 + 𝑙) + 𝑐 𝑞 𝑙=1 𝑚=1 • The computation of the convolutive map at any location sums the convolutive outputs at all planes

  30. The size of the convolution 1 0 1 0 0 1 0 bias 1 0 1 Filter Input Map • Image size: 5x5 • Filter: 3x3 • “Stride”: 1 • Output size = ?

  31. The size of the convolution 1 0 1 0 0 1 0 bias 1 0 1 Filter Input Map • Image size: 5x5 • Filter: 3x3 • Stride: 1 • Output size = ?

  32. The size of the convolution 1 0 1 0 1 1 1 0 0 0 1 0 bias 4 4 0 1 1 1 0 1 0 1 Filter 4 2 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 • Image size: 5x5 • Filter: 3x3 • Stride: 2 • Output size = ?

  33. The size of the convolution 1 0 1 0 1 1 1 0 0 0 1 0 bias 4 4 0 1 1 1 0 1 0 1 Filter 4 2 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 • Image size: 5x5 • Filter: 3x3 • Stride: 2 • Output size = ?

  34. The size of the convolution 𝑁 × 𝑁 0 1 1 1 0 0 𝑇𝑗𝑨𝑓 ∶ 𝑂 × 𝑂 bias 0 1 1 1 0 ? Filter 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 • Image size: 𝑂 × 𝑂 • Filter: 𝑁 × 𝑁 • Stride: 1 • Output size = ?

  35. The size of the convolution 𝑁 × 𝑁 0 1 1 1 0 0 𝑇𝑗𝑨𝑓 ∶ 𝑂 × 𝑂 bias 0 1 1 1 0 ? Filter 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 • Image size: 𝑂 × 𝑂 • Filter: 𝑁 × 𝑁 • Stride: 𝑇 • Output size = ?

  36. The size of the convolution 𝑁 × 𝑁 0 1 1 1 0 0 𝑇𝑗𝑨𝑓 ∶ 𝑂 × 𝑂 bias 0 1 1 1 0 ? Filter 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 • Image size: 𝑂 × 𝑂 • Filter: 𝑁 × 𝑁 • Stride: 𝑇 • Output size (each side) = 𝑂 − 𝑁 /𝑇 + 1 – Assuming you’re not allowed to go beyond the edge of the input

  37. Convolution Size • Simple convolution size pattern: – Image size: 𝑂 × 𝑂 – Filter: 𝑁 × 𝑁 – Stride: 𝑇 – Output size (each side) = 𝑂 − 𝑁 /𝑇 + 1 • Assuming you’re not allowed to go beyond the edge of the input • Results in a reduction in the output size – Even if 𝑇 = 1 – Not considered acceptable • If there’s no active downsampling, through max pooling and/or 𝑇 > 1 , then the output map should ideally be the same size as the input

  38. Solution 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 0 0 1 0 bias 1 0 1 0 0 0 1 1 1 0 Filter 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 • Zero-pad the input – Pad the input image/map all around • Add P L rows of zeros on the left and P R rows of zeros on the right • Add P L rows of zeros on the top and P L rows of zeros at the bottom – P L and P R chosen such that: • P L = P R OR | P L – P R | = 1 • P L + P R = M-1 – For stride 1, the result of the convolution is the same size as the original image

  39. Solution 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 bias 1 0 1 0 0 1 1 1 0 0 Filter 0 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 • Zero-pad the input – Pad the input image/map all around – Pad as symmetrically as possible, such that.. – For stride 1, the result of the convolution is the same size as the original image

  40. Why convolution? • Convolutional neural networks are, in fact, equivalent to scanning with an MLP – Just run the entire MLP on each block separately, and combine results • As opposed to scanning (convolving) the picture with individual neurons/filters – Even computationally, the number of operations in both computations is identical • The neocognitron in fact views it equivalently to a scan • So why convolutions?

  41. Cost of Correlation Correlation N M • Correlation : 𝑧 𝑗, 𝑘 = ෍ ෍ 𝑦 𝑗 + 𝑚, 𝑘 + 𝑛 𝑥(𝑚, 𝑛) 𝑚 𝑛 Cost of scanning an 𝑁 × 𝑁 image with an 𝑂 × 𝑂 filter: O 𝑁 2 𝑂 2 • – 𝑂 2 multiplications at each of 𝑁 2 positions • Not counting boundary effects – Expensive, for large filters

  42. Correlation in Transform Domain Correlation N M • Correlation usind DFTs : Y = 𝐽𝐸𝐺𝑈2 𝐸𝐺𝑈2(𝑌) ∘ 𝑑𝑝𝑜𝑘 𝐸𝐺𝑈2(𝑋) • Cost of doing this using the Fast Fourier Transform to compute the DFTs: O 𝑁 2 𝑚𝑝𝑕𝑂 – Significant saving for large filters – Or if there are many filters

  43. A convolutional layer Previous Previous layer layer • The convolution operation results in a convolution map • An Activation is finally applied to every entry in the map

  44. The other component Downsampling/Pooling Output Multi-layer Perceptron • Convolution (and activation) layers are followed intermittently by “ downsampling ” (or “pooling”) layers – Often, they alternate with convolution, though this is not necessary

  45. Recall: Max pooling 6 3 1 Max 4 6 Max • Max pooling selects the largest from a pool of elements • Pooling is performed by “scanning” the input

  46. Recall: Max pooling 6 6 1 3 Max 6 5 Max • Max pooling selects the largest from a pool of elements • Pooling is performed by “scanning” the input

  47. Recall: Max pooling 6 6 7 3 2 Max 5 7 Max • Max pooling selects the largest from a pool of elements • Pooling is performed by “scanning” the input

  48. Recall: Max pooling Max • Max pooling selects the largest from a pool of elements • Pooling is performed by “scanning” the input

  49. Recall: Max pooling Max • Max pooling selects the largest from a pool of elements • Pooling is performed by “scanning” the input

  50. Recall: Max pooling Max • Max pooling selects the largest from a pool of elements • Pooling is performed by “scanning” the input

  51. “Strides” Max • The “max” operations may “stride” by more than one pixel

  52. “Strides” Max • The “max” operations may “stride” by more than one pixel

  53. “Strides” Max • The “max” operations may “stride” by more than one pixel

  54. “Strides” Max • The “max” operations may “stride” by more than one pixel

  55. “Strides” Max • The “max” operations may “stride” by more than one pixel

  56. Max Pooling Single depth slice 1 1 2 4 x max pool with 2x2 filters 6 8 5 6 7 8 and stride 2 3 4 3 2 1 0 1 2 3 4 y • An 𝑂 × 𝑂 picture compressed by a 𝑄 × 𝑄 maxpooling filter with stride 𝐸 results in an output map of side ڿ(𝑂 −

  57. Alternative to Max pooling: Mean Pooling Single depth slice 1 1 2 4 x Mean pool with 2x2 3.25 5.25 5 6 7 8 filters and stride 2 2 2 3 2 1 0 1 2 3 4 y • An 𝑂 × 𝑂 picture compressed by a 𝑄 × 𝑄 maxpooling filter with stride 𝐸 results in an output map of side ڿ(𝑂 −

  58. Other options Network applies to each 2x2 block and strides by Single depth slice 2 in this example 1 1 2 4 x 6 8 5 6 7 8 3 4 3 2 1 0 1 2 3 4 y • The pooling may even be a learned filter • The same network is applied on each block • (Again, a shared parameter network)

  59. Other options Network applies to each 2x2 block and strides by Single depth slice 2 in this example 1 1 2 4 x 6 8 5 6 7 8 3 4 3 2 1 0 1 2 3 4 Network in network y • The pooling may even be a learned filter • The same network is applied on each block • (Again, a shared parameter network)

  60. Setting everything together • Typical image classification task

  61. Convolutional Neural Networks • Input: 1 or 3 images – Black and white or color – Will assume color to be generic

  62. Convolutional Neural Networks • Input: 3 pictures

  63. Convolutional Neural Networks • Input: 3 pictures

  64. Preprocessing • Typically works with square images – Filters are also typically square • Large networks are a problem – Too much detail – Will need big networks • Typically scaled to small sizes, e.g. 32x32 or 128x128

  65. Convolutional Neural Networks 𝐽 × 𝐽 𝑗𝑛𝑏𝑕𝑓 • Input: 3 pictures

  66. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 × 𝑀 × 3 𝐽 × 𝐽 𝑗𝑛𝑏𝑕𝑓 • Input is convolved with a set of K 1 filters – Typically K 1 is a power of 2, e.g. 2, 4, 8, 16, 32,.. – Filters are typically 5x5, 3x3, or even 1x1

  67. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 × 𝑀 × 3 Small enough to capture fine features (particularly important for scaled-down images) 𝐽 × 𝐽 𝑗𝑛𝑏𝑕𝑓 • Input is convolved with a set of K 1 filters – Typically K 1 is a power of 2, e.g. 2, 4, 8, 16, 32,.. – Filters are typically 5x5, 3x3, or even 1x1

  68. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 × 𝑀 × 3 Small enough to capture fine features (particularly important for scaled-down images) What on earth is this? 𝐽 × 𝐽 𝑗𝑛𝑏𝑕𝑓 • Input is convolved with a set of K 1 filters – Typically K 1 is a power of 2, e.g. 2, 4, 8, 16, 32,.. – Filters are typically 5x5, 3x3, or even 1x1

  69. The 1x1 filter • A 1x1 filter is simply a perceptron that operates over the depth of the map, but has no spatial extent – Takes one pixel from each of the maps (at a given location) as input

  70. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 × 𝑀 × 3 𝐽 × 𝐽 𝑗𝑛𝑏𝑕𝑓 • Input is convolved with a set of K 1 filters – Typically K 1 is a power of 2, e.g. 2, 4, 8, 16, 32,.. – Better notation: Filters are typically 5x5(x3), 3x3(x3), or even 1x1(x3)

  71. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 × 𝑀 × 3 Parameters to choose: 𝐿 1 , 𝑀 and 𝑇 1. Number of filters 𝐿 1 2. Size of filters 𝑀 × 𝑀 × 3 + 𝑐𝑗𝑏𝑡 3. Stride of convolution 𝑇 Total number of parameters: 𝐿 1 3𝑀 2 + 1 𝐽 × 𝐽 𝑗𝑛𝑏𝑕𝑓 • Input is convolved with a set of K 1 filters – Typically K 1 is a power of 2, e.g. 2, 4, 8, 16, 32,.. – Better notation: Filters are typically 5x5(x3), 3x3(x3), or even 1x1(x3) – Typical stride : 1 or 2

  72. Convolutional Neural Networks K 1 total filters Filter size: 𝑀 × 𝑀 × 3 𝐽 × 𝐽 𝑗𝑛𝑏𝑕𝑓 • The input may be zero-padded according to the size of the chosen filters

  73. Convolutional Neural Networks K 1 filters of size: 𝑀 × 𝑀 × 3 𝐽 × 𝐽 1 𝑍 The layer includes a convolution operation 1 followed by an activation (typically RELU) 𝑀 𝑀 1 𝑍 1 (𝑗, 𝑘) = 1 𝑑, 𝑙, 𝑚 𝐽 𝑑 𝑗 + 𝑙, 𝑘 + 𝑚 + 𝑐 𝑛 (1) 2 𝑨 𝑛 ෍ ෍ ෍ 𝑥 𝑛 𝑑∈{𝑆,𝐻,𝐶} 𝑙=1 𝑚=1 1 (𝑗, 𝑘) = 𝑔 𝑨 𝑛 1 (𝑗, 𝑘) 𝑍 𝑛 𝐽 × 𝐽 𝑗𝑛𝑏𝑕𝑓 1 𝑍 𝐿 1 • First convolutional layer: Several convolutional filters – Filters are “3 - D” (third dimension is color) – Convolution followed typically by a RELU activation • Each filter creates a single 2-D output map

  74. Learnable parameters in the first convolutional layer • The first convolutional layer comprises 𝐿 1 filters, each of size 𝑀 × 𝑀 × 3 – Spatial span: 𝑀 × 𝑀 – Depth : 3 (3 colors) • This represents a total of 𝐿 1 3𝑀 2 + 1 parameters – “+ 1” because each filter also has a bias • All of these parameters must be learned

  75. Convolutional Neural Networks Filter size: 𝑀 × 𝑀 × 3 𝐽 × 𝐽 𝐽/𝐸 × (𝐽/𝐸 The layer pools PxP blocks of Y into a single value 1 1 𝑍 𝑉 1 1 It employs a stride D between adjacent blocks pool 1 𝑍 1 1 (𝑗, 𝑘) = 1 (𝑙, 𝑚) 𝑉 2 2 𝑉 𝑛 max 𝑍 𝑛 𝑙∈ 𝑗−1 𝐸+1, 𝑗𝐸 , 𝑚∈ 𝑘−1 𝐸+1, 𝑘𝐸 𝐽 × 𝐽 𝑗𝑛𝑏𝑕𝑓 1 1 𝑍 𝑉 𝐿 1 𝐿 1 • First downsampling layer: From each 𝑄 × 𝑄 block of each map, pool down to a single value – For max pooling, during training keep track of which position had the highest value

  76. Convolutional Neural Networks Filter size: 𝑀 × 𝑀 × 3 𝐽 × 𝐽 𝐽/𝐸 × (𝐽/𝐸 1 (𝑗, 𝑘) = 1 (𝑙, 𝑚) 𝑉 𝑛 max 𝑍 𝑛 1 1 𝑙∈ 𝑗−1 𝐸+1, 𝑗𝐸 , 𝑍 𝑉 1 1 𝑚∈ 𝑘−1 𝐸+1, 𝑘𝐸 Parameters to choose: Size of pooling block 𝑄 pool 1 𝑍 1 Pooling stride 𝐸 𝑉 2 2 Choices: Max pooling or mean pooling? 𝐽 × 𝐽 𝑗𝑛𝑏𝑕𝑓 Or learned pooling? 1 1 𝑍 𝑉 𝐿 1 𝐿 1 • First downsampling layer: From each 𝑄 × 𝑄 block of each map, pool down to a single value – For max pooling, during training keep track of which position had the highest value

Recommend


More recommend