Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu - PowerPoint PPT Presentation

Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Snapchat Research Xiaoyu Wang Shenghuo Zhu Ming Yang Yuanqing Lin Snapchat Research Horizon Robotics Alibaba Group Baidu

Snapchat Overview of this section • Regionlet Object Detector • Regionlet Localizer (re-localization) • Regionlet with Deep CNN Feature • CNN Feature Extraction • Support Pixel Integral Image • Application Examples • Car Detection for Fine-grained Image Classification • Pedestrian, Car, Cyclist Detection for Autonomous Driving

Snapchat What is Regionlet Object Detector • A significant extension to traditional boosting object detector • Together with OverFeat and R-CNN, the Regionlet detector is one of the first several detectors that successfully adopt deep CNN features for generic object detection.

Snapchat How does Regionlet detector connect to past/future CNN-based Object RealBoost 1 Boosting Feature Detection Selection Segmentation as Spatial Pyramid Pooling Selective Search 2 Object Proposal in SPP-Net 4 Low-level Feature Generalized Spatial Pyramid for CNN RoI Pooling in Fast R- Feature Pooling CNN 5 Deep CNN 3 future 2013 Past 1. C. Huang, et. al. Boosting nested cascade detector for multi-view face detection. ICPR , 2004. 2. K. E. A. Van de Sande, et. al. Segmentation as selective search for object recognition. ICCV 2011 3. Krizhevsky, et. al. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012 4. He, et. al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. ECCV 2014 5. Ross Girshick. Fast R-CNN. ICCV 2015

Snapchat Boosting Object Detector 𝒚 𝟐 𝑂 𝑔 𝑌 = � 𝛾 𝑗 ℎ 𝑗 ( 𝑦 𝑗 ) 𝑗=0 𝒚 𝟑 𝒚 𝑶 Weak classifier A sub-region where weak A detection window classifier is built based on

Snapchat Traditional Boosting Detection Framework Model 1 Model 2 Use multiple components to detect Operate on multiple scales to objects with various aspect ratios detect objects in different scales How about a single model, but flexible during testing, no feature pyramids, no multiple components

Snapchat What the Regionlet Detector Proposed • A boosting classifier that can take inputs of different scales • A boosting classifier that can take inputs of different viewpoints • A boosting classifier containing feature pooling learning

Snapchat Regionlet: Definition Region( 𝑆 ): Feature extraction region • Regionlet( 𝑠 1 , 𝑠 2 , 𝑠 • 3 ): A sub-region in a feature extraction area whose position/resolution are relative and normalized to a detection window Region Regionlet

Snapchat Regionlet: Definition( cont. ) • Regionlet coordinates are normalized ( 𝑚 , 𝑢 , 𝑠 , 𝑐 ) Traditional (50,50,180,180) ℎ 𝑥 , 𝑢 𝑚 ℎ , 𝑠 𝑥 , 𝑐 ℎ Normalized 𝑥 (.25, .25, .90,.90) (50,50,180,180) (.25, .25, .90,.90)

Snapchat Regionlet: Definition( cont. ) • Regionlet definition = Generalized Spatial Pyramid • Similar • Both use relative coordinates • Difference • Regionlet: coordinates are relative to the detection window (not the image) • Regionlet: coordinates are flexible (do not have to evenly divide the image/window) • Regionlet feature extraction = Generalized Spatial Pyramid Pooling Rectangles in Spatial Pyramid Rectangles in Generalized Spatial Pyramid

Snapchat Connection to other methods in pooling design CNN-based Object Detection Object Proposal Spatial Pyramid Pooling in SPP-Net 1 Generalized Spatial Pyramid for CNN RoI Pooling in Fast R- Feature Pooling CNN 2 1. He, et. al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. ECCV 2014 2. Ross Girshick. Fast R-CNN. ICCV 2015

Snapchat Regionlet: Feature extraction Non-local pooling Could be Hand-crafted features or deep CNN features, whatever feature your like!

Snapchat Regionlet Classifier • Each weak classifier is based on a 1-D feature extracted from a region Feature extraction 𝑦 Feature Regionlets 𝑜−1 ℎ 𝑦 = � 𝑤 𝑝 𝟚 𝐶 𝑦 = 0 Weak Classifier 𝑝=1 𝑈 H 𝑌 = � β 𝑗 ℎ 𝑗 ( 𝑦 𝑗 ) Strong Classifier 𝑗=1

Snapchat Detection Framework (a) (b) (c) (a) : Input image Regionlet (b) : Generate object regions 1,2,3 Region (c) : Feature extraction and pooling  Generalized Spatial Pyramid Pooling inside Regionget  Low-level features  CNN features (will talk later)  Max-pooling among Regionlets 1. K. E. A. Van de Sande, et. al. Segmentation as selective search for object recognition. ICCV 2011 2. B. Alexe , et. al. Measuring the objectness of image windows. T-PAMI 2012 3. S. Ren, et. al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015

Snapchat Multiple scale & viewpoints Handling Not a motorbike Regionlet Model Adjusting the model to a candidate bounding box Adjusting the model to a candidate bounding box

Snapchat Multiple scale & viewpoints Handling Motorbike Detected Regionlet Model Adjusting the model to a candidate bounding box Adjusting the model to a candidate bounding box

Snapchat Weak Classifier Construction • Weak learner on each REGION Regionlet feature 𝑦 𝑗 (after pooling) • Eight lots lookup table • Lookup table is learned • Lot value is learned Assign lot • One lot is activated for one feature 0.01 -0.2 -0.5 -0.4 0.02 0.15 0.5 0.3 𝑂 H( 𝑌 ) = � 𝑀𝑀𝑈 𝑗 ( 𝑦 𝑗 ) 𝑗=1 Weak learner output: -0.5

Snapchat Regionlet Training • How to get regions and regionlets • Regions • Regions are randomly sampled • Effective Regions are greedily selected to reduce learning cost • Regionlets • Each Region & Regionlet configuration are randomly configured • A Region and its regionlets configuration are selected simultaneously • Region & Regionlet pool is fixed for each cascade learning

Snapchat Regionlet: Training • Constructing the regions/regionlets pool • Small region, fewer regionlets -> fine spatial layout • Large region, more regionlets -> robust to deformation • Learning realBoost 1 cascades • 16K region/regionlets candidates for each cascade • Learning of each cascade stops when the error rate is achieved (1% for positive, 37.5% for negative) • Last cascade stops after collecting 5000 weak classifiers • Result in 4-7 cascades • 2-3 hours to finish training one category on a 8-core machine 1. C. Huang, et. al. Boosting nested cascade detector for multi-view face detection. ICPR , 2004.

Snapchat Regionlet: Testing • No image resizing • Any scale, any aspect ratio • Adapt the model size to the same size as the object candidate bounding box One model, resize + image Multiple models, original + image + Ours, One model, original image

Snapchat Overview of this section • Regionlet Object Detector • Regionlet Localizer • Regionlet with Deep CNN Feature • CNN Feature Extraction • Support Pixel Integral Image • Application Examples • Car Detection • Pedestrian, Car, Cyclist Detection for Autonomous Driving

Snapchat Regionlet Localizer (object re-localization) • Why a localizer is needed (classification & localization precision dilemma) VS Data augmentation during As accurate location as possible training to accommodate during testing inaccurate localization

Snapchat Regionlet Localizer • Regionlet feature can be reused for localization • Each Regionlet feature is associated with a spatial location • The location is learned during classifier training

Snapchat Regionlet Localizer • Regionlet feature can be reused for localization Regionlet classifier 1 Regionlet classifier N 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 8N dimensional binary vector

Snapchat Regionlet Localizer ⋯ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 × 𝑿 ∆𝒎 , ∆𝒖 , ∆𝒔 , ∆𝒄

Snapchat Regionlet Localizer Training • Random sample examples which have > 0.6 overlap with ground truth • Less overlap gives poor results • The regression task learns the location difference

Snapchat Regionlet Localizer • Experiment result on our car dataset for autonomous driving • 17501 cars for training • 12546 cars for testing Detection performance (% AP) 0.5 overlap 0.7 overlap Regionlet 62.7% 34.6% Regionlet + localization 65.3% 43.9% Improvement 2.6% 9.1%

Snapchat Overview of this section • Regionlet Object Detector • Regionlet Localizer • Regionlet with Deep CNN Feature • CNN Feature Extraction • Support Pixel Integral Image • Application Examples • Car Detection • Pedestrian, Car, Cyclist Detection for Autonomous Driving

Snapchat Regionlet with DCNN • Deep CNN • Deep structure learns high-level information • Max-pooling is robust to parts misalignment • Information are jointly learned • How to establish a bridge for DCNN and Regionlet object detection framework?

Snapchat Regionlet with DCNN • Deep CNN structure • Features from convolution layers retain spatial information Convolutional layers

Snapchat Regionlet with DCNN • Deep CNN structure • Features from convolution layers retain spatial information A feature vector

Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu - PowerPoint PPT Presentation

Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Snapchat Research Xiaoyu Wang Shenghuo Zhu Ming Yang Yuanqing Lin Snapchat Research Horizon Robotics Alibaba Group Baidu Snapchat Overview of this section

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

List of hand outs for this session Hand out 1: Incident decision tree Hand out 2: Yorkshire

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Dynamic Graph CNN for learning on point clouds Wang Yue, et al. Otakar Jaek March 25, 2019

Hand Hygiene Stefan Morton Hand Hygiene Coordinator Evidence Improved adherence to hand

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN Gang Yu

Logistics Crafted to Fit Your Needs ABOU ABOUT T US US VORTEX Worldwide Logistics is an

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Strategic Plan for Detector R&D at Fermilab Petra Merkel Fermilab Detector R&D

Creators of the finest hand painted wallpapers and fabrics, hand carved furniture and hand painted

Raise your hand in Zoom Click on Participants Your hand is raised Click hand to lower it

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Beyond RetinaNet and Mask R-CNN Gang Yu yugang@megvii.com Outline Modern Object detectors

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

OWL, Patterns, & FOL COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli

Traffic Incident Management (TIM) Program TIM Program operational goals Relationship to TSMO

Traffic Incident Management Capability Maturity Self-Assessment 2018 Results 1 TIM Capability

SCHOOL FINANCE SCHOOL BUDGET I. School District Total Revenue Sources ( State Report Card)

CS381V Paper Presentation Chun-Chen Kuo Selective Search for Object Recognition Outline

Autonomous Driving Xiaozhi Chen Tsinghua University Joint work with Kaustav Kunku, Yukun Zhu,

Category-level localization Cordelia Schmid Recognition Classification Object

Object detection Subhransu Maji CMPSCI 670: Computer Vision November 29, 2016 Administrivia

Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu - PowerPoint PPT Presentation

Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Snapchat Research Xiaoyu Wang Shenghuo Zhu Ming Yang Yuanqing Lin Snapchat Research Horizon Robotics Alibaba Group Baidu Snapchat Overview of this section

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

List of hand outs for this session Hand out 1: Incident decision tree Hand out 2: Yorkshire

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Dynamic Graph CNN for learning on point clouds Wang Yue, et al. Otakar Jaek March 25, 2019

Hand Hygiene Stefan Morton Hand Hygiene Coordinator Evidence Improved adherence to hand

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN Gang Yu

Logistics Crafted to Fit Your Needs ABOU ABOUT T US US VORTEX Worldwide Logistics is an

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Strategic Plan for Detector R&amp;D at Fermilab Petra Merkel Fermilab Detector R&amp;D

Creators of the finest hand painted wallpapers and fabrics, hand carved furniture and hand painted

Raise your hand in Zoom Click on Participants Your hand is raised Click hand to lower it

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Beyond RetinaNet and Mask R-CNN Gang Yu yugang@megvii.com Outline Modern Object detectors

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

OWL, Patterns, &amp; FOL COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli

Traffic Incident Management (TIM) Program TIM Program operational goals Relationship to TSMO

Traffic Incident Management Capability Maturity Self-Assessment 2018 Results 1 TIM Capability

SCHOOL FINANCE SCHOOL BUDGET I. School District Total Revenue Sources ( State Report Card)

CS381V Paper Presentation Chun-Chen Kuo Selective Search for Object Recognition Outline

Autonomous Driving Xiaozhi Chen Tsinghua University Joint work with Kaustav Kunku, Yukun Zhu,

Category-level localization Cordelia Schmid Recognition Classification Object

Object detection Subhransu Maji CMPSCI 670: Computer Vision November 29, 2016 Administrivia

Strategic Plan for Detector R&D at Fermilab Petra Merkel Fermilab Detector R&D

OWL, Patterns, & FOL COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli