Common recognition tasks Adapted from Slide from L. Lazebnik. Fei-Fei Li
Image classification and tagging • outdoor • mountains • city • Asia • Lhasa • … Adapted from Slide from L. Lazebnik. Fei-Fei Li
Object detection • find pedestrians Adapted from Slide from L. Lazebnik. Fei-Fei Li
Activity recognition • walking • shopping • rolling a cart • sitting • talking • … Adapted from Slide from L. Lazebnik. Fei-Fei Li
Semantic segmentation Adapted from Slide from L. Lazebnik. Fei-Fei Li
Semantic segmentation sky mountain building tree building lamp lamp umbrella umbrella person market stall person person person ground person Adapted from Slide from L. Lazebnik. Fei-Fei Li
Image description This is a busy street in an Asian city. Mountains and a large palace or fortress loom in the background. In the foreground, we see colorful souvenir stalls and people walking around and shopping. One person in the lower left is pushing an empty cart, and a couple of people in the middle are sitting, possibly posing for a photograph. Adapted from Slide from L. Lazebnik. Fei-Fei Li
The statistical learning framework • Apply a prediction function to a feature representation of the image to get the desired output: f( ) = “apple” f( ) = “tomato” f( ) = “cow” Slide from L. Lazebnik.
The statistical learning framework y = f( x ) output prediction feature function representation • Training: given a training set of labeled examples {( x 1 ,y 1 ), …, ( x N ,y N )}, estimate the prediction function f by minimizing the prediction error on the training set • Testing: apply f to a never before seen test example x and output the predicted value y = f( x ) Slide from L. Lazebnik.
Steps Training Training Labels Training Images Image Learned Training Features model Learned model Testing Image Prediction Features Test Image Slide credit: D. Hoiem
“Classic” recognition pipeline Image Class Feature Trainable Pixels label representation classifier • Hand-crafted feature representation • Off-the-shelf trainable classifier Slide from L. Lazebnik.
Neural networks for images image Fully connected layer Slide from L. Lazebnik.
Neural networks for images image Slide from L. Lazebnik.
Neural networks for images feature map learned weights image Slide from L. Lazebnik.
Neural networks for images another feature map another set of learned weights image Slide from L. Lazebnik.
Convolution as feature extraction K feature maps bank of K filters . . . feature map image Slide from L. Lazebnik.
Convolutional layer K feature maps Spatial resolution: K filters (roughly) the same if stride of 1 is used, reduced by 1/S if stride of S is used image convolutional layer Slide from L. Lazebnik.
Convolutional layer L feature maps in the next layer K feature maps F x F x K filter L filters image convolutional layer + ReLU Slide from L. Lazebnik.
CNN pipeline Feature maps Spatial pooling Non-linearity Convolution . (Learned) . . Input Image Input Feature Map Source: R. Fergus, Y. LeCun
CNN pipeline Feature maps Spatial pooling Non-linearity Convolution (Learned) Input Image Source: R. Fergus, Y. LeCun Source: Stanford 231n
CNN pipeline Feature maps Max (or Avg) Spatial pooling Non-linearity Convolution (Learned) Input Image Source: R. Fergus, Y. LeCun
Max pooling layer K feature maps, resolution 1/S K feature maps max value F x F pooling filter, stride S Usually: F=2 or 3, S=2 Slide from L. Lazebnik.
Summary: CNN pipeline Softmax layer: exp( w c ⋅ x ) P ( c | x ) = C ∑ exp( w k ⋅ x ) k = 1 Slide from L. Lazebnik.
Training of multi-layer networks Find network weights to minimize the prediction loss between • true and estimated labels of training examples: 𝐹 𝐱 = ∑ ! 𝑚(𝐲 ! , 𝑧 ! ; 𝐱) • Possible losses (for binary problems): • 𝐱 (𝐲 ! ) − 𝑧 ! # Quadratic loss: 𝑚 𝐲 ! , 𝑧 ! ; 𝐱 = 𝑔 • Log likelihood loss: 𝑚 𝐲 ! , 𝑧 ! ; 𝐱 = −log 𝑄 𝐱 𝑧 ! | 𝐲 ! • Hinge loss: 𝑚 𝐲 ! , 𝑧 ! ; 𝐱 = max(0,1 − 𝑧 ! 𝑔 𝐱 𝐲 ! ) • Slide from L. Lazebnik.
Convolutional networks linear conv subsample conv subsample filters filters weights Slide from B. Hariharan
Empirical Risk Minimization N 1 X min L ( h ( x i ; θ ) , y i ) N θ i =1 Convolutional network N θ ( t +1) = θ ( t ) � λ 1 X r L ( h ( x i ; θ ) , y i ) N i =1 Gradient descent update Slide from B. Hariharan
Computing the gradient of the loss r L ( h ( x ; θ ) , y ) z = h ( x ; θ ) r θ L ( z, y ) = ∂ L ( z, y ) ∂ z ∂ z ∂ θ Slide from B. Hariharan
Convolutional networks linear conv subsample conv subsample filters filters weights Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 = ∂ f 5 ( z 4 , w 5 ) ∂ z ∂ w 5 ∂ w 5 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 = ∂ f 5 ( z 4 , w 5 ) ∂ f 4 ( z 3 , w 4 ) ∂ z = ∂ z ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 = ∂ f 5 ( z 4 , w 5 ) ∂ f 4 ( z 3 , w 4 ) ∂ z = ∂ z ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 = ∂ f 5 ( z 4 , w 5 ) ∂ f 4 ( z 3 , w 4 ) ∂ z = ∂ z ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 = ∂ f 5 ( z 4 , w 5 ) ∂ f 4 ( z 3 , w 4 ) ∂ z = ∂ z ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 = ∂ f 5 ( z 4 , w 5 ) ∂ f 4 ( z 3 , w 4 ) ∂ z = ∂ z ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 = ∂ f 5 ( z 4 , w 5 ) ∂ f 4 ( z 3 , w 4 ) ∂ z = ∂ z ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 ∂ z 4 ∂ w 4 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 ∂ z = ∂ z ∂ z 3 ∂ w 3 ∂ z 3 ∂ w 3 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 ∂ z = ∂ z ∂ z 3 ∂ w 3 ∂ z 3 ∂ w 3 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 ∂ z = ∂ z ∂ z 3 ∂ w 3 ∂ z 3 ∂ w 3 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 ∂ z = ∂ z ∂ z 4 ∂ z 3 ∂ z 4 ∂ z 3 ∂ z = ∂ z ∂ z 3 ∂ w 3 ∂ z 3 ∂ w 3 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 ∂ z = ∂ z ∂ z 4 ∂ z 3 ∂ z 4 ∂ z 3 ∂ z = ∂ z ∂ z 3 ∂ w 3 ∂ z 3 ∂ w 3 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 Recurrence ∂ z = ∂ z ∂ z 3 going ∂ z 2 ∂ z 3 ∂ z 2 backward!! ∂ z = ∂ z ∂ z 2 ∂ w 2 ∂ z 2 ∂ w 2 Slide from B. Hariharan
The gradient of convnets z 5 = z z 1 z 2 z 3 z 4 x f 1 f 2 f 3 f 4 f 5 w 1 w 2 w 3 w 4 w 5 Back-Propagation Slide from B. Hariharan
Backpropagation for a sequence of functions • Each “function” has a “forward” and “backward” module • Forward module for f i • takes z i-1 and weight w i as input • produces z i as output • Backward module for f i • takes g(z i ) as input • produces g(z i-1 ) and g(w i ) as output g ( w i ) = g ( z i ) ∂ z i g ( z i − 1 ) = g ( z i ) ∂ z i ∂ z i − 1 ∂ w i Slide from B. Hariharan
Backpropagation for a sequence of functions z i-1 f i z i w i Slide from B. Hariharan
Backpropagation for a sequence of functions g(z i-1 ) f i g(z i ) g(w i ) Slide from B. Hariharan
Computation graph - Functions • Each node implements two functions • A “forward” • Computes output given input • A “backward” • Computes derivative of z w.r.t input, given derivative of z w.r.t output Slide from B. Hariharan
Computation graphs a b f i d c Slide from B. Hariharan
Computation graphs ∂ z ∂ a ∂ z ∂ z f i ∂ b ∂ d ∂ z ∂ c Slide from B. Hariharan
Computation graphs a b f i d c Slide from B. Hariharan
Computation graphs ∂ z ∂ a ∂ z ∂ z f i ∂ b ∂ d ∂ z ∂ c Slide from B. Hariharan
Neural network frameworks Slide from B. Hariharan
Image classification and tagging • outdoor • mountains • city • Asia • Lhasa • … Adapted from Fei-Fei Li
Recommend
More recommend