neural networks
play

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain - PowerPoint PPT Presentation

Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain 2 NEURAL NETWORKS What well cover ... f ( x ) computer vision architectures - convolutional networks - data augmentation 1 - residual networks ... ...


  1. Neural Networks Hugo Larochelle ( @hugo_larochelle ) Google Brain

  2. 2 NEURAL NETWORKS • What we’ll cover ... • f ( x ) ‣ computer vision architectures - convolutional networks - data augmentation 1 - residual networks ... ... ‣ natural language processing architectures - word embeddings - recurrent neural networks 1 ... ... - long short-term memory networks (LSTMs) 1 ... ... • x 1 x j x d x

  3. Neural Networks Computer vision

  4. 4 COMPUTER VISION Topics: computer vision, object recognition • Computer vision is the design of computers that can process visual data and accomplish some given task ‣ we will focus on object recognition: given some input image, identify which object it contains Caltech 101 dataset 112 pixels ‘‘sun flower’’ 150 pixels

  5. 5 COMPUTER VISION Topics: computer vision • We can design neural networks that are specifically adapted for such problems ‣ must deal with very high-dimensional inputs - 150 x 150 pixels = 22500 inputs, or 3 x 22500 if RGB pixels ‣ can exploit the 2D topology of pixels (or 3D for video data) ‣ can build in invariance to certain variations we can expect - translations, illumination, etc. • Convolutional networks leverage these ideas ‣ local connectivity ‣ parameter sharing ‣ pooling / subsampling hidden units

  6. 6 COMPUTER VISION Topics: local connectivity ... • First idea: use a local ... ... connectivity of hidden units ‣ each hidden unit is connected only to a subregion (patch) of the input image ‣ it is connected to all channels - 1 if greyscale image - 3 (R, G, B) for color image • Solves the following problems: ‣ fully connected hidden layer would have an unmanageable number of parameters r = receptive field ‣ computing the linear activations of the hidden units would be very expensive

  7. 7 COMPUTER VISION Topics: local connectivity • Units are connected to all channels: ‣ 1 channel if grayscale image, 3 channels (R, G, B) if color image ... ... ...

  8. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... same color = same matrix of connections 6

  9. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... same color = same matrix of connections 6

  10. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... same color = same matrix of connections 6

  11. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... same color = same matrix of connections 6

  12. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... W ij is the matrix same color connecting the i th = input channel with same matrix of connections the j th feature map 6

  13. 8 COMPUTER VISION Topics: parameter sharing • Second idea: share matrix of parameters across certain units ‣ units organized into the same ‘‘feature map’’ share parameters ‣ hidden units within a feature map cover different positions in the image feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... W ij is the matrix same color connecting the i th = input channel with same matrix of connections the j th feature map 6

  14. 9 COMPUTER VISION Topics: parameter sharing • Solves the following problems: ‣ reduces even more the number of parameters ‣ will extract the same features at every position (features are ‘‘equivariant’’) feature map 1 feature map 2 feature map 3 ... ... ... ... ... ... W ij is the matrix same color connecting the i th = input channel with same matrix of connections the j th feature map

  15. 10 COMPUTER VISION Topics: parameter sharing Jarret et al. 2009 • Each feature map forms a 2D grid of features ‣ can be computed with a discrete convolution ( ) of a kernel matrix k ij which is 
 H X ∗ the hidden weights matrix W ij with its rows and columns flipped 
 f e a t u r e m ‣ x i is the i th channel of input a p s ‣ k ij is the convolution kernel ‣ g j is a learned scaling factor ‣ y j is the hidden layer (could have added a bias) � y j = g j tanh( k ij ∗ x i ) i

  16. 
 11 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q • Example: 0 80 40 0 0.25 = * 20 40 0 0.5 1 0 0 40 k x

  17. 
 11 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: ~ = k with rows and columns flipped k 0 80 40 1 0.5 0 0.25 = * 20 40 0 0.25 0 0.5 1 0 0 40 k x

  18. 
 12 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 0 + 0.5 x 80 + 0.25 x 20 + 0 x 40 0 80 40 1 0.5 0 0.25 45 = * 20 40 0 0.25 0 0.5 1 0 0 40 k x

  19. 
 13 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 80 + 0.5 x 40 + 0.25 x 40 + 0 x 0 0 80 40 1 0.5 0 0.25 45 110 = * 20 40 0 0.25 0 0.5 1 0 0 40 k x

  20. 
 14 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 20 + 0.5 x 40 + 0.25 x 0 + 0 x 0 0 80 40 0 0.25 45 110 = * 20 40 0 1 0.5 0.5 1 40 0 0 40 k 0.25 0 x

  21. 
 15 COMPUTER VISION Topics: discrete convolution • The convolution of an image x with a kernel k is computed as follows: 
 ( x * k ) ij = ∑ x i + p,j + q k r-p,r-q pq • Example: 1 x 40 + 0.5 x 0 + 0.25 x 0 + 0 x 40 0 80 40 0 0.25 45 110 = * 20 40 0 1 0.5 0.5 1 40 40 0 0 40 k 0.25 0 x

  22. 16 COMPUTER VISION Topics: discrete convolution • Pre-activations from channel x i into feature map y j can be computed by: ~ ‣ getting the convolution kernel where k ij = W ij from the connection matrix W ij ‣ applying the convolution x i * k ij • This is equivalent to computing the discrete correlation 
 of x i with W ij

  23. 17 COMPUTER VISION Topics: discrete convolution ~ 0% 0.5% • Simple illustration: x i * k ij where W ij = W ij 0.5% 0% W 0.5% 0% 0% 0% 255% 0% 0% 0% 128% 128% 0% 0.5% 0% 0% 0% 255% 0% 0% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 0% x i * k ij x i X W %%%%% %%%%%

  24. 18 COMPUTER VISION Topics: discrete convolution • With a non-linearity, we get a detector of a feature at any position in the image 0% 0% 255% 0% 0% 0.02% 0.19% 0.19% 0.02% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0.02% 0.19% 0.19% 0.02% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0.02% 0.75% 0.02% 0.02% 0% 255% 0% 0% 0% 255% 0% 0% 0% 0.75% 0.02% 0.02% 0.02% 255% 0% 0% 0% 255% 0% 0% 0% 0% sigm(0.02 x i * k ij - 4) x i Logis6c (%(%%%%%%%%%%%%%n%200%)%/%50%)%

  25. 19 COMPUTER VISION Topics: discrete convolution • Can use ‘‘zero padding’’ to allow going over the borders ( * ) 0% 0% 255% 0% 0% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0% 128% 128% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 255% 0% 0% 0% 0% x i * k ij x i

  26. 19 COMPUTER VISION Topics: discrete convolution • Can use ‘‘zero padding’’ to allow going over the borders ( * ) 0% 255% 0% 255% 0% 0% 0% 0% 0% 0% 0% 0% 0% 255% 0% 255% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 128% 128% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 128% 128% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 0% 0% 255% 0% 0% 0% 0% 0% 0% x i * k ij x i

Recommend


More recommend