Pay Attention to the Pixel, Understand the Scene Better Shu Kong CS, ICS, UCI
Background: Scene Parsing pixel level labeling semantic segmentation -- assigning a class label to each pixel wall painting pillow sofa cabinet towel
Background: Scene Parsing old days (before 2013), extracting features to represent pixels, unary pixel classification and pixel pairs for CRF features, transforms, grouping, etc.
Background: Scene Parsing nowadays, deep learning
Background: Convolutional Neural Net Convolutional Neural Network (CNN) aggregating local information avoid computing over whole image directly Y. Handwritten digit recognition: Applications of neural net chips and automatic learning, Neural Computation, 1989.
Background: Receptive Field in CNN Receptive Field @ high-level layers photo credit to Honglak Lee
Background: CNN for Scene Parsing Image Classification cat vs. dog
Background: CNN for Scene Parsing Dense Prediction wall painting pillow sofa cabinet towel
Background: Perspective and Scale size(car) > size(train)? size(chair) > size(whiteboard)?
Outline 1. Perspective-aware Pooling 2. Pixel-wise Attentional Gating (PAG) 1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing 3. Discussion
Outline 1. Perspective-aware Pooling 2. Pixel-wise Attentional Gating (PAG) 1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing 3. Discussion
Perspective-aware Pooling Goal: deciding for each pixel the size of receptive field (RF) to aggregate information
Perspective-aware Pooling Goal: deciding for each pixel the size of receptive field (RF) to aggregate information The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information.
Perspective-aware Pooling Goal: deciding for each pixel the size of receptive field (RF) to aggregate information The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information.. Depth conveys the scale information.
Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth
Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth dilated convolution (Atrous Convolution).
Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth dilated convolution (Atrous Convolution).
Perspective-aware Pooling 2D atrous convolution of different dilate rates. allowing for larger RF to aggregate more contextual information DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth
Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16} Multiplicative gating
Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling Idea: making the pooling size adaptive w.r.t depth quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling When depth is not available in inference -- S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling When depth is not available in inference -- Idea: train a depth estimator S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling When depth is not available in inference -- Idea: train a depth estimator S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling When depth is not available in inference -- Idea: train a depth estimator Why better? capacity, representation power S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling Consistent improvement S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling Recurrent refinement by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling Recurrent refinement by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling Recurrently refining by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling Recurrent refinement by adapting the predicted depth S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling No depth at all? Idea: train unsupervised attention map S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling No depth at all? Idea: train unsupervised attention map S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Perspective-aware Pooling No depth at all? Idea: train unsupervised attention map gt-depth pred-depth attention S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Pixel-wise Attentional Gating 1. Perspective-aware Pooling 2. Pixel-wise Attentional Gating (PAG) 1. Attentional Pooling for the “right” receptive field 2. pixel-level dynamic routing 3. Discussion
Pixel-wise Attentional Gating Improvement 1. attention vs. depth general, learning rule
Pixel-wise Attentional Gating Improvement 1. attention vs. depth general, learning rule 2. binary gating vs. weighted gating S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
Pixel-wise Attentional Gating Improvement 1. attention vs. depth general, learning rule 2. binary gating vs. weighted gating computational saving S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
Pixel-wise Attentional Gating PAG produces binary masks, e.g., using argmax on softmax.
Pixel-wise Attentional Gating PAG produces binary masks, e.g., using argmax on softmax. How to output binary maps but still allowing end-to-end training?
Pixel-wise Attentional Gating PAG produces binary masks, e.g., using argmax on softmax. How to output binary maps but still allowing end-to-end training? Idea: Gumbel-softmax trick [1,2] [1] Categorical reparameterization with gumbel-softmax, ICLR, 2017 [2] The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Then, Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Then, Here is an alternative that where Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
Pixel-wise Attentional Gating The normal way for sampling is to use argmax operator on softmax. Then, Here is an alternative that where random variable m follows Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
Pixel-wise Attentional Gating but not continuous, not differentiable Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
Pixel-wise Attentional Gating but not continuous, not differentiable expressing a discrete random variable as a one-hot vector Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
Pixel-wise Attentional Gating but not continuous, not differentiable expressing a discrete random variable as a one-hot vector the Gumbel softmax relaxation is Categorical reparameterization with gumbel-softmax, ICLR, 2017 The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
Pixel-wise Attentional Gating one-hot-encoded categorical distribution from argmax to uniform controlled by = 0~inf Categorical reparameterization with gumbel-softmax, ICLR, 2017
Pixel-wise Attentional Gating PAG produces binary masks. Two applications 1. parallel pooling branch for deciding the “right” receptive field 2. pixel-level dynamic routing
Recommend
More recommend