hypercolumns for object segmentation and fine grained
play

Hypercolumns for Object Segmentation and Fine-grained Localization - PowerPoint PPT Presentation

Hypercolumns for Object Segmentation and Fine-grained Localization Bharath Hariharan, Pablo Arbelaez, RossGirshick, Jitendra Malik Gksu Erdoan Image Classification horse, person, building Slide credit:Bharath Hariharan Object Detection


  1. Hypercolumns for Object Segmentation and Fine-grained Localization Bharath Hariharan, Pablo Arbelaez, RossGirshick, Jitendra Malik Göksu Erdoğan

  2. Image Classification horse, person, building Slide credit:Bharath Hariharan

  3. Object Detection Slide credit:Bharath Hariharan

  4. Simultaneous Detection and Segmentation Detect and segment every instanceof the categoryin the image B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. In ECCV, 2014 Slide credit:Bharath Hariharan

  5. SDS Semantic Segmentation Slide credit:Bharath Hariharan

  6. Simultaneous Detection and Part Labeling Detect and segment every instanceof the categoryin the image and labelits parts Slide credit:Bharath Hariharan

  7. Simultaneous Detection and Keypoint Prediction Detect every instanceof the category in the imageand mark its keypoints Slide credit:Bharath Hariharan

  8. Motivation § Task: Assigncategory labelsto imagesor boundingboxes § General Approach: Output of last layer of CNN § This is most sensitive to category-levelsemanticinformation § The informationis generalizedover in the top layer § Is output of last layer of CNN appropriate for finer- grained problems?

  9. Motivation § Not optimal representation! § Last layer of CNN is mostly invariant to ‘nuisance ’ variablessuch as pose, illumination, articulation, preciselocation… § Pose and nuisancevariablesare preciselywhat we interested in. § How can we get such an information?

  10. Motivation § It is present in intermediate layers § Less sensitive to semantics

  11. Motivation § Top layerslose localizationinformation § Bottom layers are not semanticenough § Combine both

  12. Detection and Segmentation  Simultaneous detection and segmentation B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. In ECCV, 2014

  13. Combining features across multiple levels: PedestrianDetection Combine subsampled intermediate layerswith top layer Difference Upsampling Pedestrian Detectionwith Unsupervised Multi-StageFeature Learning Sermanet et. al.

  14. Framework § Start from a detection (R-CNN) § Heatmaps § Use category-specific, instance-specificinformationto… § Classifyeachpixel in detection window Slide credit:Bharath Hariharan

  15. One Framework, Many Tasks: Task Classification Target SDS Does the pixel belong to the object? Part labeling Which part does the pixel belong to? Pose estimation Does it lie on/near a particular keypoint Slide credit:Bharath Hariharan

  16. Heatmaps for each task § Segmentation: § Probability that a particular locationinside the object § Part Labeling: § Separate heatmap for each part § Each heatmap is the probability a location belongs to that part § KeypointPrediction § Separate heatmap for each keypoint § Each heatmap isthe probability of the keypoint at a particular location

  17. Hypercolumns Slide credit:Bharath Hariharan

  18. Hypercolumns § Term derived from Hubel and Wiesel § Re-imaginesold ideas: § Jets(Koenderink and van Doorn) § Pyramids(Burt andAdelson) § Filter Banks(Malik and Perona) Slide credit:Bharath Hariharan

  19. Computing the Hypercolumn Representation § Upsamplingfeature map F to f § feature vector for at locationi § alfa_ik: positionof i and k in the box § Concatenate features from every locationto one long vector

  20. Interpolating into grid of classifiers § Fully connectedlayerscontribute to global instance-specificbias § Different classifierforeach locationcontribute to seperate instance- specificbias § Simplest way to get locationspecificclassifier: § train seperate classifiersat each 50x50 locations § What would be the problems of this approach?

  21. Interpolating into grid of classifiers Reduce amount of data for eachclassifierduringtraining 1. Computationallyexpensive 2. Classifiervary with locations 3. Risk of overfitting 4. How can we escape from these problems?

  22. Interpolate into coarse grid of classifiers § Train a coarse KxKgrid of classifiersandinterpolate between them § Interpolate grid of functionsinstead of values § Each classifierin the grid is a functiong k (.) § g k (feature vector)=probability § Score of i’th pixel

  23. Training classifiers § Interpolationis not used in train time § Divide each box to KxK grid § Training data for k’th classifieronlyconsistsof pixels from the k’th grid cell acrossall traininginstances. § Train with logisticregression

  24. Hypercolumns Slide credit:Bharath Hariharan

  25. Efficient pixel classification § Upsamplinglargefeature maps is expensive! § If classificationandupsamplingare linear § Classification o upsampling=Upsampling o classification § Linear classification=1x1 convolution § Extension : use nxn convolution § Classification=convolve,upsample,sum,sigmoid

  26. Efficient pixel classification Slide credit:Bharath Hariharan

  27. Efficient pixel classification Slide credit:Bharath Hariharan

  28. Efficient pixel classification Slide credit:Bharath Hariharan

  29. Representation as a neural network

  30. Training classifiers § MCG candidatesoverlapswith ground truth by %70 or more § For eachcandidate findmost overlappedground truth instance § Crop ground truth to the expandedboundingbox of the candidate § Label locationspositiveor negative accordingto problem

  31. Experiments

  32. Evaluation Metric § Similar to bounding box detection metric ∩ § Box overlap= ∪ § § If box overlap> threshold, correct Slide credit:Bharath Hariharan

  33. Evaluation Metric § Similar to bounding box detection metric § But with segments instead of boundingboxes § Each detection/GT comes with a segment ∩ segment overlap= ∪ § If segment overlap> threshold, correct Slide credit:Bharath Hariharan

  34. Task 1:SDS § System 1: § Refinement step with hypercolumnsrepresentation § Features § Top-level fc7 features § Conv4 features § Pool2 features § 1/0 according to location was inside original regioncandidate or not § Coarse 10x10 discretizationof original candidate into 100-dimensional vector § 10x10 grid of classifiers § Project predictionsover superpixelsand average

  35. Task 1:SDS System 1

  36. Task 1:SDS § System 2: § MCG insteadof selective search § Expand set of boxes by adding nearby high-scoringboxes after NMS

  37. Task 1:SDS

  38. Hypercolumns vs Top Layer

  39. Hypercolumns vs Top Layer Slide credit:Bharath Hariharan

  40. Task 2:Part Labeling Slide credit:Bharath Hariharan

  41. Task 2:Part Labeling

  42. Task 2:Part Labeling

  43. Task 3: Keypoint Prediction

  44. Task 3: Keypoint Prediction

  45. Task 3: Keypoint Prediction

  46. Conclusion § A general framework for fine-grained localization that: § Leverages information from multiple CNN layers § Achieves state-of-the-art performance on SDS and part labeling and accurate results on keypoint prediction Slide credit:Bharath Hariharan

  47. Future Work § applyinghypercolumnrepresentationto fine-grained tasks § Attribute classification § Action classification § …

  48. Questions???

  49. THANK YOU J

Recommend


More recommend