regionlet object detector with hand crafted and cnn
play

Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu - PowerPoint PPT Presentation

Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Snapchat Research Xiaoyu Wang Shenghuo Zhu Ming Yang Yuanqing Lin Snapchat Research Horizon Robotics Alibaba Group Baidu Snapchat Overview of this section


  1. Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Snapchat Research Xiaoyu Wang Shenghuo Zhu Ming Yang Yuanqing Lin Snapchat Research Horizon Robotics Alibaba Group Baidu

  2. Snapchat Overview of this section • Regionlet Object Detector • Regionlet Localizer (re-localization) • Regionlet with Deep CNN Feature • CNN Feature Extraction • Support Pixel Integral Image • Application Examples • Car Detection for Fine-grained Image Classification • Pedestrian, Car, Cyclist Detection for Autonomous Driving

  3. Snapchat What is Regionlet Object Detector • A significant extension to traditional boosting object detector • Together with OverFeat and R-CNN, the Regionlet detector is one of the first several detectors that successfully adopt deep CNN features for generic object detection.

  4. Snapchat How does Regionlet detector connect to past/future CNN-based Object RealBoost 1 Boosting Feature Detection Selection Segmentation as Spatial Pyramid Pooling Selective Search 2 Object Proposal in SPP-Net 4 Low-level Feature Generalized Spatial Pyramid for CNN RoI Pooling in Fast R- Feature Pooling CNN 5 Deep CNN 3 future 2013 Past 1. C. Huang, et. al. Boosting nested cascade detector for multi-view face detection. ICPR , 2004. 2. K. E. A. Van de Sande, et. al. Segmentation as selective search for object recognition. ICCV 2011 3. Krizhevsky, et. al. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012 4. He, et. al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. ECCV 2014 5. Ross Girshick. Fast R-CNN. ICCV 2015

  5. Snapchat Boosting Object Detector 𝒚 𝟐 𝑂 𝑔 𝑌 = � 𝛾 𝑗 ℎ 𝑗 ( 𝑦 𝑗 ) 𝑗=0 𝒚 𝟑 𝒚 𝑶 Weak classifier A sub-region where weak A detection window classifier is built based on

  6. Snapchat Traditional Boosting Detection Framework Model 1 Model 2 Use multiple components to detect Operate on multiple scales to objects with various aspect ratios detect objects in different scales How about a single model, but flexible during testing, no feature pyramids, no multiple components

  7. Snapchat What the Regionlet Detector Proposed • A boosting classifier that can take inputs of different scales • A boosting classifier that can take inputs of different viewpoints • A boosting classifier containing feature pooling learning

  8. Snapchat Regionlet: Definition Region( 𝑆 ): Feature extraction region • Regionlet( 𝑠 1 , 𝑠 2 , 𝑠 • 3 ): A sub-region in a feature extraction area whose position/resolution are relative and normalized to a detection window Region Regionlet

  9. Snapchat Regionlet: Definition( cont. ) • Regionlet coordinates are normalized ( 𝑚 , 𝑢 , 𝑠 , 𝑐 ) Traditional (50,50,180,180) ℎ 𝑥 , 𝑢 𝑚 ℎ , 𝑠 𝑥 , 𝑐 ℎ Normalized 𝑥 (.25, .25, .90,.90) (50,50,180,180) (.25, .25, .90,.90)

  10. Snapchat Regionlet: Definition( cont. ) • Regionlet definition = Generalized Spatial Pyramid • Similar • Both use relative coordinates • Difference • Regionlet: coordinates are relative to the detection window (not the image) • Regionlet: coordinates are flexible (do not have to evenly divide the image/window) • Regionlet feature extraction = Generalized Spatial Pyramid Pooling Rectangles in Spatial Pyramid Rectangles in Generalized Spatial Pyramid

  11. Snapchat Connection to other methods in pooling design CNN-based Object Detection Object Proposal Spatial Pyramid Pooling in SPP-Net 1 Generalized Spatial Pyramid for CNN RoI Pooling in Fast R- Feature Pooling CNN 2 1. He, et. al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. ECCV 2014 2. Ross Girshick. Fast R-CNN. ICCV 2015

  12. Snapchat Regionlet: Feature extraction Non-local pooling Could be Hand-crafted features or deep CNN features, whatever feature your like!

  13. Snapchat Regionlet Classifier • Each weak classifier is based on a 1-D feature extracted from a region Feature extraction 𝑦 Feature Regionlets 𝑜−1 ℎ 𝑦 = � 𝑤 𝑝 𝟚 𝐶 𝑦 = 0 Weak Classifier 𝑝=1 𝑈 H 𝑌 = � β 𝑗 ℎ 𝑗 ( 𝑦 𝑗 ) Strong Classifier 𝑗=1

  14. Snapchat Detection Framework (a) (b) (c) (a) : Input image Regionlet (b) : Generate object regions 1,2,3 Region (c) : Feature extraction and pooling  Generalized Spatial Pyramid Pooling inside Regionget  Low-level features  CNN features (will talk later)  Max-pooling among Regionlets 1. K. E. A. Van de Sande, et. al. Segmentation as selective search for object recognition. ICCV 2011 2. B. Alexe , et. al. Measuring the objectness of image windows. T-PAMI 2012 3. S. Ren, et. al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015

  15. Snapchat Multiple scale & viewpoints Handling Not a motorbike Regionlet Model Adjusting the model to a candidate bounding box Adjusting the model to a candidate bounding box

  16. Snapchat Multiple scale & viewpoints Handling Motorbike Detected Regionlet Model Adjusting the model to a candidate bounding box Adjusting the model to a candidate bounding box

  17. Snapchat Weak Classifier Construction • Weak learner on each REGION Regionlet feature 𝑦 𝑗 (after pooling) • Eight lots lookup table • Lookup table is learned • Lot value is learned Assign lot • One lot is activated for one feature 0.01 -0.2 -0.5 -0.4 0.02 0.15 0.5 0.3 𝑂 H( 𝑌 ) = � 𝑀𝑀𝑈 𝑗 ( 𝑦 𝑗 ) 𝑗=1 Weak learner output: -0.5

  18. Snapchat Regionlet Training • How to get regions and regionlets • Regions • Regions are randomly sampled • Effective Regions are greedily selected to reduce learning cost • Regionlets • Each Region & Regionlet configuration are randomly configured • A Region and its regionlets configuration are selected simultaneously • Region & Regionlet pool is fixed for each cascade learning

  19. Snapchat Regionlet: Training • Constructing the regions/regionlets pool • Small region, fewer regionlets -> fine spatial layout • Large region, more regionlets -> robust to deformation • Learning realBoost 1 cascades • 16K region/regionlets candidates for each cascade • Learning of each cascade stops when the error rate is achieved (1% for positive, 37.5% for negative) • Last cascade stops after collecting 5000 weak classifiers • Result in 4-7 cascades • 2-3 hours to finish training one category on a 8-core machine 1. C. Huang, et. al. Boosting nested cascade detector for multi-view face detection. ICPR , 2004.

  20. Snapchat Regionlet: Testing • No image resizing • Any scale, any aspect ratio • Adapt the model size to the same size as the object candidate bounding box One model, resize + image Multiple models, original + image + Ours, One model, original image

  21. Snapchat Overview of this section • Regionlet Object Detector • Regionlet Localizer • Regionlet with Deep CNN Feature • CNN Feature Extraction • Support Pixel Integral Image • Application Examples • Car Detection • Pedestrian, Car, Cyclist Detection for Autonomous Driving

  22. Snapchat Regionlet Localizer (object re-localization) • Why a localizer is needed (classification & localization precision dilemma) VS Data augmentation during As accurate location as possible training to accommodate during testing inaccurate localization

  23. Snapchat Regionlet Localizer • Regionlet feature can be reused for localization • Each Regionlet feature is associated with a spatial location • The location is learned during classifier training

  24. Snapchat Regionlet Localizer • Regionlet feature can be reused for localization Regionlet classifier 1 Regionlet classifier N 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 8N dimensional binary vector

  25. Snapchat Regionlet Localizer ⋯ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 × 𝑿 ∆𝒎 , ∆𝒖 , ∆𝒔 , ∆𝒄

  26. Snapchat Regionlet Localizer Training • Random sample examples which have > 0.6 overlap with ground truth • Less overlap gives poor results • The regression task learns the location difference

  27. Snapchat Regionlet Localizer • Experiment result on our car dataset for autonomous driving • 17501 cars for training • 12546 cars for testing Detection performance (% AP) 0.5 overlap 0.7 overlap Regionlet 62.7% 34.6% Regionlet + localization 65.3% 43.9% Improvement 2.6% 9.1%

  28. Snapchat Overview of this section • Regionlet Object Detector • Regionlet Localizer • Regionlet with Deep CNN Feature • CNN Feature Extraction • Support Pixel Integral Image • Application Examples • Car Detection • Pedestrian, Car, Cyclist Detection for Autonomous Driving

  29. Snapchat Regionlet with DCNN • Deep CNN • Deep structure learns high-level information • Max-pooling is robust to parts misalignment • Information are jointly learned • How to establish a bridge for DCNN and Regionlet object detection framework?

  30. Snapchat Regionlet with DCNN • Deep CNN structure • Features from convolution layers retain spatial information Convolutional layers

  31. Snapchat Regionlet with DCNN • Deep CNN structure • Features from convolution layers retain spatial information A feature vector

Recommend


More recommend