Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN - PowerPoint PPT Presentation

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN Gang Yu 旷视研究院

Schedule of Tutorial • Lecture 1: Beyond RetinaNet and Mask R-CNN (Gang Yu) • Lecture 2: AutoML for Object Detection (Xiangyu Zhang) • Lecture 3: Finegrained Visual Analysis (Xiu-shen Wei)

Outline • Introduction to Object Detection • Modern Object detectors • One Stage detector vs Two-stage detector • Challenges • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-Grained • Conclusion

What is object detection?

Detection - Evaluation Criteria Average Precision (AP) and mAP Figures are from wikipedia

Detection - Evaluation Criteria mmAP Figures are from http://cocodataset.org

How to perform a detection? • Sliding window: enumerate all the windows (up to millions of windows) • VJ detector: cascade chain • Fully Convolutional network • shared computation Robust Real-time Object Detection; Viola, Jones; IJCV 2001 http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf

General Detection Before Deep Learning • Feature + classifier • Feature • Haar Feature • HOG (Histogram of Gradient) • LBP (Local Binary Pattern) • ACF (Aggregated Channel Feature) • … • Classifier • SVM • Bootsing • Random Forest

Traditional Hand-crafted Feature: HoG

General Detection Before Deep Learning Traditional Methods • Pros • Efficient to compute (e.g., HAAR, ACF) on CPU • Easy to debug, analyze the bad cases • reasonable performance on limited training data • Cons • Limited performance on large dataset • Hard to be accelerated by GPU

Deep Learning for Object Detection Based on the whether following the “proposal and refine” • One Stage • Example: Densebox, YOLO (YOLO v2), SSD, Retina Net • Keyword: Anchor, Divide and conquer, loss sampling • Two Stage • Example: RCNN (Fast RCNN, Faster RCNN), RFCN, FPN, MaskRCNN • Keyword: speed, performance

A bit of History OverFeat(2013) MultiBox(2014) Densebox (2015) UnitBox (2016) EAST (2017) YOLO (2015) Anchor Free classification Feature Image Anchor imported YOLOv2 (2016) Extractor localization RON(2017) SSD (2015) (bbox) RetinaNet(2017) DSSD (2017) One stage detector two stages detector RFCN++ (2017) classification Feature Proposal Image RFCN (2016) Extractor localization RCNN (2014) Fast RCNN(2015) Faster RCNN (2015) (bbox) FPN (2017) classification Refine Mask RCNN (2017) localization (bbox)

Modern Object detectors Postprocess Backbone Head NMS • Modern object detectors • RetinaNet • f1-f7 for backbone, f3-f7 with 4 convs for head • FPN with ROIAlign • f1-f6 for backbone, two fcs for head • Recall vs localization • One stage detector: Recall is high but compromising the localization ability • Two stage detector: Strong localization ability

One Stage detector: RetinaNet • FPN Structure • Focal loss Focal Loss for Dense Object Detection ， Lin etc, ICCV 2017 Best student paper

Two-Stage detector: FPN/Mask R-CNN • FPN Structure • ROIAlign Mask R-CNN ， He etc, ICCV 2017 Best paper

What is next for object detection? • The pipeline seems to be mature • There still exists a large gap between existing state-of-arts and product requirements • The devil is in the detail

Challenges Overview • Backbone • Head • Pretraining • Scale • Batch Size • Crowd • NAS • Fine-grained Postprocess Backbone Head NMS

Challenges - Backbone • Backbone network is designed for classification task but not for localization task • Receptive Field vs Spatial resolution • Only f1-f5 is pretrained but randomly initializing f6 and f7 (if applicable)

Backbone - DetNet • DetNet: A Backbone network for Object Detection, Li etc, 2018, https://arxiv.org/pdf/1804.06215.pdf

Backbone - DetNet

Challenges - Head • Speed is significantly improved for the two-stage detector • RCNN - > Fast RCNN -> Faster RCNN - > RFCN • How to obtain efficient speed as one stage detector like YOLO, SSD? • Small Backbone • Light Head

Head – Light head RCNN • Light-Head R-CNN: In Defense of Two-Stage Object Detector, 2017, https://arxiv.org/pdf/1711.07264.pdf Code: https://github.com/zengarden/light_head_rcnn

Head – Light head RCNN • Backbone • L: Resnet101 • S: Xception145 • Thin Feature map • L:C_{mid} = 256 • S: C_{mid} =64 • C_{out} = 10 * 7 * 7 • R-CNN subnet • A fc layer is connected to the PS ROI pool/Align

Head – Light head RCNN

Head – Light head RCNN • Mobile Version • ThunderNet: Towards Real-time Generic Object Detection, Qin etc, Arxiv 2019 • https://arxiv.org/abs/1903.11752

Pretraining – Objects365 • ImageNet pretraining is usually employed for backbone training • Training from Scratch • Scratch Det claims GN/BN is important • Rethinking ImageNet Pretraining validates that training time is important

Pretraining – Objects365 • Objects365 Dataset

Pretraining – Objects365 • Pretraining with Objects365 vs ImageNet vs from Sctratch

Pretraining – Objects365 • Pretraining on Backbone or Pretraining on both backbone and head

Pretraining – Objects365 • Results on VOC Detection & VOC Segmentation

Pretraining – Objects365 • Summary • Pretraining is important to reduce the training time • Pretraining with a large dataset is beneficial for the performance

Challenges - Scale • Scale variations is extremely large for object detection

Challenges - Scale • Scale variations is extremely large for object detection • Previous works • Divide and Conquer: SSD, DSSD, RON, FPN, … • Limited Scale variation • Scale Normalization for Image Pyramids, Singh etc, CVPR2018 • Slow inference speed • How to address extremely large scale variation without compromising inference speed?

Scale - SFace • SFace: An Efficient Network for Face Detection in Large Scale Variations, 2018, http://cn.arxiv.org/pdf/1804.06559.pdf • Anchor-based: • Good localization for the scales which are covered by anchors • Difficult to address all the scale ranges of faces • Anchor-free: • Able to cover various face scales • Not good for the localization ability

Scale - SFace

Scale - SFace • Summary: • Integrate anchor-based and anchor-free for the scale issue • A new benchmark for face detection with large scale variations: 4K Face

Challenges - Batchsize • Small mini-batchsize for general object detection • 2 for R-CNN, Faster RCNN • 16 for RetinaNet, Mask RCNN • Problem with small mini-batchsize • Long training time • Insufficient BN statistics • Inbalanced pos/neg ratio

Batchsize – MegDet • MegDet: A Large Mini-Batch Object Detector, CVPR2018, https://arxiv.org/pdf/1711.07240.pdf

Batchsize – MegDet • Techniques • Learning rate warmup • Cross-GPU Batch Normalization

Challenges - Crowd • NMS is a post-processing step to eliminate multiple responses on one object instance • Reasonable for mild crowdness like COCO and VOC • Will Fail in the case when the objects are in a crowd

Challenges - Crowd • A few works have been devoted to this topic • Softnms, Bodla etc, ICCV 2017, http://www.cs.umd.edu/~bharat/snms.pdf • Relation Networks, Hu etc, CVPR 2018, https://arxiv.org/pdf/1711.11575.pdf • Lacking a good benchmark for evaluation in the literature

Crowd - CrowdHuman • CrowdHuman: A Benchmark for Detecting Human in a Crowd, 2018, https://arxiv.org/pdf/1805.00123.pdf, http://www.crowdhuman.org/ • A benchmark with Head, Visible Human, Full body bounding-box • Generalization ability for other head/pedestrian datasets • Crowdness

Crowd - CrowdHuman

Crowd-CrowdHuman

Crowd-CrowdHuman • Generalization • Head • Pedestrian • COCO

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN - PowerPoint PPT Presentation

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN Gang Yu Schedule of Tutorial Lecture 1: Beyond RetinaNet and Mask R-CNN (Gang Yu) Lecture 2: AutoML for Object Detection (Xiangyu Zhang) Lecture

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

From image classification to object detection Image classification Object detection Image source

AutoML for Object Detection Xiangyu Zhang MEGVII Research 1 AutoML for Advances in AutoML

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

Lecture 11: Object detection Contains slides from S. Lazebnik, R. Girshick, B. Hariharan 1

Object Detection Ujjwal Post-Doc, STARS Team INRIA Sophia Antipolis Outline What is Object

A Review on Salient Object Detection Feng Lin Salient Object Detection Target Detect and

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Holistic Scene Understanding for 3D Object Detection with RGB-D cameras Dahua Lin, Sanja Fidler,

Deep Neural Networks for Object Detection Paper by C. Szegedy, A. Toshev, D. Erhan [2013]

Fusing Generic Objectness and Visual Saliency for Salient Object Detection Yasin KAVAK

possiblY Big data analytics for music data conchita control management song upload provider

EcoRNN : Efficient Computing of LSTM RNN on GPUs Bojian Zheng (Graduate Student), Gennady

Real-Time Decisions Using ML on the Google Cloud Platform Przemysaw Pastuszka & Carlos

Architecting to Support Machine Learning Humberto Cervantes, UAM Iurii Milovanov, SoftServe

Affordable 3D LIDAR May14-08 Nicolas Cabeen Eric VanDenover Todd Wegter Xiang Peter Wang

The new KMT-CLS Steering Sensor Measurements at the original steering wheel of automobiles and

Integrating national items into questionnaires of international large scale studies: The case

Integrating chains of DRR measures in coastal impact assessment: An application in Varna, Bulgaria

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN - PowerPoint PPT Presentation

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN Gang Yu Schedule of Tutorial Lecture 1: Beyond RetinaNet and Mask R-CNN (Gang Yu) Lecture 2: AutoML for Object Detection (Xiangyu Zhang) Lecture

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Object Detection Sanja Fidler CSC420: Intro to Image Understanding 1 / 48 Object Detection The

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

From image classification to object detection Image classification Object detection Image source

AutoML for Object Detection Xiangyu Zhang MEGVII Research 1 AutoML for Advances in AutoML

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

Lecture 11: Object detection Contains slides from S. Lazebnik, R. Girshick, B. Hariharan 1

Object Detection Ujjwal Post-Doc, STARS Team INRIA Sophia Antipolis Outline What is Object

A Review on Salient Object Detection Feng Lin Salient Object Detection Target Detect and

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Holistic Scene Understanding for 3D Object Detection with RGB-D cameras Dahua Lin, Sanja Fidler,

Deep Neural Networks for Object Detection Paper by C. Szegedy, A. Toshev, D. Erhan [2013]

Fusing Generic Objectness and Visual Saliency for Salient Object Detection Yasin KAVAK

possiblY Big data analytics for music data conchita control management song upload provider

EcoRNN : Efficient Computing of LSTM RNN on GPUs Bojian Zheng (Graduate Student), Gennady

Real-Time Decisions Using ML on the Google Cloud Platform Przemysaw Pastuszka &amp; Carlos

Architecting to Support Machine Learning Humberto Cervantes, UAM Iurii Milovanov, SoftServe

Affordable 3D LIDAR May14-08 Nicolas Cabeen Eric VanDenover Todd Wegter Xiang Peter Wang

The new KMT-CLS Steering Sensor Measurements at the original steering wheel of automobiles and

Integrating national items into questionnaires of international large scale studies: The case

Integrating chains of DRR measures in coastal impact assessment: An application in Varna, Bulgaria

Real-Time Decisions Using ML on the Google Cloud Platform Przemysaw Pastuszka & Carlos