pay attention to the pixel understand the scene better
play

Pay Attention to the Pixel, Understand the Scene Better Shu Kong - PowerPoint PPT Presentation

Pay Attention to the Pixel, Understand the Scene Better Shu Kong CS, ICS, UCI Background: Scene Parsing pixel level labeling semantic segmentation -- assigning a class label to each pixel wall painting pillow sofa cabinet


  1. Pay Attention to the Pixel, Understand the Scene Better Shu Kong CS, ICS, UCI

  2. Background: Scene Parsing pixel level labeling semantic segmentation -- assigning a class label to each pixel wall painting pillow sofa cabinet towel

  3. Background: Scene Parsing old days (before 2013), extracting features to represent pixels, unary pixel classification and pixel pairs for CRF features, transforms, grouping, etc.

  4. Background: Scene Parsing nowadays, deep learning

  5. Background: Convolutional Neural Net Convolutional Neural Network (CNN) aggregating local information avoid computing over whole image directly Y. Handwritten digit recognition: Applications of neural net chips and automatic learning, Neural Computation, 1989.

  6. Background: Receptive Field in CNN Receptive Field @ high-level layers photo credit to Honglak Lee

  7. Background: CNN for Scene Parsing Image Classification cat vs. dog

  8. Background: CNN for Scene Parsing Dense Prediction wall painting pillow sofa cabinet towel

  9. Background: Perspective and Scale size(car) > size(train)? size(chair) > size(whiteboard)?

  10. Outline 1. Perspective-aware Pooling 2. Pixel-wise Attentional Gating (PAG) 1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing 3. Discussion

  11. Outline 1. Perspective-aware Pooling 2. Pixel-wise Attentional Gating (PAG) 1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing 3. Discussion

  12. Perspective-aware Pooling Goal: deciding for each pixel the size of receptive field (RF) to aggregate information

  13. Perspective-aware Pooling Goal: deciding for each pixel the size of receptive field (RF) to aggregate information The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information.

  14. Perspective-aware Pooling Goal: deciding for each pixel the size of receptive field (RF) to aggregate information The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information.. Depth conveys the scale information.

  15. Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth

  16. Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth dilated convolution (Atrous Convolution).

  17. Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth dilated convolution (Atrous Convolution).

  18. Perspective-aware Pooling 2D atrous convolution of different dilate rates. allowing for larger RF to aggregate more contextual information DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

  19. Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth

  20. Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

  21. Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16} Multiplicative gating

  22. Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

  23. Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

  24. Perspective-aware Pooling When depth is not available in inference -- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  25. Perspective-aware Pooling When depth is not available in inference -- Idea: train a depth estimator S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  26. Perspective-aware Pooling When depth is not available in inference -- Idea: train a depth estimator S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  27. Perspective-aware Pooling When depth is not available in inference -- Idea: train a depth estimator Why better? capacity, representation power S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  28. Perspective-aware Pooling Consistent improvement S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  29. Perspective-aware Pooling Recurrent refinement by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  30. Perspective-aware Pooling Recurrent refinement by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  31. Perspective-aware Pooling Recurrently refining by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  32. Perspective-aware Pooling Recurrent refinement by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  33. Perspective-aware Pooling No depth at all? Idea: train unsupervised attention map S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  34. Perspective-aware Pooling No depth at all? Idea: train unsupervised attention map S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  35. Perspective-aware Pooling No depth at all? Idea: train unsupervised attention map gt-depth pred-depth attention S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

  36. Pixel-wise Attentional Gating 1. Perspective-aware Pooling 2. Pixel-wise Attentional Gating (PAG) 1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing 3. Discussion

  37. Pixel-wise Attentional Gating Improvement 1. attention vs. depth general, learning rule

  38. Pixel-wise Attentional Gating Improvement 1. attention vs. depth general, learning rule 2. binary gating vs. weighted gating S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018

  39. Pixel-wise Attentional Gating Improvement 1. attention vs. depth general, learning rule 2. binary gating vs. weighted gating computational saving S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018

  40. Pixel-wise Attentional Gating PAG produces binary masks, e.g., using argmax on softmax.

  41. Pixel-wise Attentional Gating PAG produces binary masks, e.g., using argmax on softmax. How to output binary maps but still allowing end-to-end training?

  42. Pixel-wise Attentional Gating PAG produces binary masks, e.g., using argmax on softmax. How to output binary maps but still allowing end-to-end training? Idea: Gumbel-softmax trick [1,2] [1] Categorical reparameterization with gumbel-softmax, ICLR, 2017 [2] The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

  43. Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

  44. Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Then, Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

  45. Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Then, Here is an alternative that where Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

  46. Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Then, Here is an alternative that where random variable m follows Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

  47. Pixel-wise Attentional Gating but not continuous, not differentiable Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

  48. Pixel-wise Attentional Gating but not continuous, not differentiable expressing a discrete random variable as a one-hot vector Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

  49. Pixel-wise Attentional Gating but not continuous, not differentiable expressing a discrete random variable as a one-hot vector the Gumbel softmax relaxation is Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

  50. Pixel-wise Attentional Gating one-hot-encoded categorical distribution from argmax to uniform controlled by = 0~inf Categorical reparameterization with gumbel-softmax, ICLR, 2017

  51. Pixel-wise Attentional Gating PAG produces binary masks. Two applications 1. parallel pooling branch for deciding the “right” receptive field 2. pixel-level dynamic routing

Recommend


More recommend