Hypercolumns for Object Segmentation and Fine-grained Localization Bharath Hariharan, Pablo Arbelaez, RossGirshick, Jitendra Malik Göksu Erdoğan
Image Classification horse, person, building Slide credit:Bharath Hariharan
Object Detection Slide credit:Bharath Hariharan
Simultaneous Detection and Segmentation Detect and segment every instanceof the categoryin the image B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. In ECCV, 2014 Slide credit:Bharath Hariharan
SDS Semantic Segmentation Slide credit:Bharath Hariharan
Simultaneous Detection and Part Labeling Detect and segment every instanceof the categoryin the image and labelits parts Slide credit:Bharath Hariharan
Simultaneous Detection and Keypoint Prediction Detect every instanceof the category in the imageand mark its keypoints Slide credit:Bharath Hariharan
Motivation § Task: Assigncategory labelsto imagesor boundingboxes § General Approach: Output of last layer of CNN § This is most sensitive to category-levelsemanticinformation § The informationis generalizedover in the top layer § Is output of last layer of CNN appropriate for finer- grained problems?
Motivation § Not optimal representation! § Last layer of CNN is mostly invariant to ‘nuisance ’ variablessuch as pose, illumination, articulation, preciselocation… § Pose and nuisancevariablesare preciselywhat we interested in. § How can we get such an information?
Motivation § It is present in intermediate layers § Less sensitive to semantics
Motivation § Top layerslose localizationinformation § Bottom layers are not semanticenough § Combine both
Detection and Segmentation  Simultaneous detection and segmentation B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. In ECCV, 2014
Combining features across multiple levels: PedestrianDetection Combine subsampled intermediate layerswith top layer Difference Upsampling Pedestrian Detectionwith Unsupervised Multi-StageFeature Learning Sermanet et. al.
Framework § Start from a detection (R-CNN) § Heatmaps § Use category-specific, instance-specificinformationto… § Classifyeachpixel in detection window Slide credit:Bharath Hariharan
One Framework, Many Tasks: Task Classification Target SDS Does the pixel belong to the object? Part labeling Which part does the pixel belong to? Pose estimation Does it lie on/near a particular keypoint Slide credit:Bharath Hariharan
Heatmaps for each task § Segmentation: § Probability that a particular locationinside the object § Part Labeling: § Separate heatmap for each part § Each heatmap is the probability a location belongs to that part § KeypointPrediction § Separate heatmap for each keypoint § Each heatmap isthe probability of the keypoint at a particular location
Hypercolumns Slide credit:Bharath Hariharan
Hypercolumns § Term derived from Hubel and Wiesel § Re-imaginesold ideas: § Jets(Koenderink and van Doorn) § Pyramids(Burt andAdelson) § Filter Banks(Malik and Perona) Slide credit:Bharath Hariharan
Computing the Hypercolumn Representation § Upsamplingfeature map F to f § feature vector for at locationi § alfa_ik: positionof i and k in the box § Concatenate features from every locationto one long vector
Interpolating into grid of classifiers § Fully connectedlayerscontribute to global instance-specificbias § Different classifierforeach locationcontribute to seperate instance- specificbias § Simplest way to get locationspecificclassifier: § train seperate classifiersat each 50x50 locations § What would be the problems of this approach?
Interpolating into grid of classifiers Reduce amount of data for eachclassifierduringtraining 1. Computationallyexpensive 2. Classifiervary with locations 3. Risk of overfitting 4. How can we escape from these problems?
Interpolate into coarse grid of classifiers § Train a coarse KxKgrid of classifiersandinterpolate between them § Interpolate grid of functionsinstead of values § Each classifierin the grid is a functiong k (.) § g k (feature vector)=probability § Score of i’th pixel
Training classifiers § Interpolationis not used in train time § Divide each box to KxK grid § Training data for k’th classifieronlyconsistsof pixels from the k’th grid cell acrossall traininginstances. § Train with logisticregression
Hypercolumns Slide credit:Bharath Hariharan
Efficient pixel classification § Upsamplinglargefeature maps is expensive! § If classificationandupsamplingare linear § Classification o upsampling=Upsampling o classification § Linear classification=1x1 convolution § Extension : use nxn convolution § Classification=convolve,upsample,sum,sigmoid
Efficient pixel classification Slide credit:Bharath Hariharan
Efficient pixel classification Slide credit:Bharath Hariharan
Efficient pixel classification Slide credit:Bharath Hariharan
Representation as a neural network
Training classifiers § MCG candidatesoverlapswith ground truth by %70 or more § For eachcandidate findmost overlappedground truth instance § Crop ground truth to the expandedboundingbox of the candidate § Label locationspositiveor negative accordingto problem
Experiments
Evaluation Metric § Similar to bounding box detection metric ∩ § Box overlap= ∪ § § If box overlap> threshold, correct Slide credit:Bharath Hariharan
Evaluation Metric § Similar to bounding box detection metric § But with segments instead of boundingboxes § Each detection/GT comes with a segment ∩ segment overlap= ∪ § If segment overlap> threshold, correct Slide credit:Bharath Hariharan
Task 1:SDS § System 1: § Refinement step with hypercolumnsrepresentation § Features § Top-level fc7 features § Conv4 features § Pool2 features § 1/0 according to location was inside original regioncandidate or not § Coarse 10x10 discretizationof original candidate into 100-dimensional vector § 10x10 grid of classifiers § Project predictionsover superpixelsand average
Task 1:SDS System 1
Task 1:SDS § System 2: § MCG insteadof selective search § Expand set of boxes by adding nearby high-scoringboxes after NMS
Task 1:SDS
Hypercolumns vs Top Layer
Hypercolumns vs Top Layer Slide credit:Bharath Hariharan
Task 2:Part Labeling Slide credit:Bharath Hariharan
Task 2:Part Labeling
Task 2:Part Labeling
Task 3: Keypoint Prediction
Task 3: Keypoint Prediction
Task 3: Keypoint Prediction
Conclusion § A general framework for fine-grained localization that: § Leverages information from multiple CNN layers § Achieves state-of-the-art performance on SDS and part labeling and accurate results on keypoint prediction Slide credit:Bharath Hariharan
Future Work § applyinghypercolumnrepresentationto fine-grained tasks § Attribute classification § Action classification § …
Questions???
THANK YOU J
Recommend
More recommend