R-CNN minus R Karel Lenc, Andrea Vedaldi Object detection 2 Goal : - PowerPoint PPT Presentation

Visual Geometry Group, Department of Engineering Science R-CNN minus R Karel Lenc, Andrea Vedaldi

Object detection 2 Goal : tightly enclose objects of a certain type in a bounding box “bikes” “planes” “horses” “birds”

Top performer: Region proposals + CNN 3 chair CNN WHAT background CNN potted CNN plant proposal WHERE generation [Girshick et al. 2013, He et al. 2014, 2015]

Top performer: Region proposals + CNN 4 WHAT Convolutional neural networks E.g. region classification with AlexNet, VGG VD [Krizhevsky et al. 2012, Simonyan Zisserman 2014] WHERE Segmentation algorithm E.g. region proposal from selective search [Uijlings et al. 2013]

Can CNN understand where as well as what ? 5 proposal Where? WHAT generation Convolutional neural network ? WHERE

Approaches to object detection 6 Scanning windows Hough voting Sliding windows Implicit shape models [Amit Geman ▶ ▶ 1997, Leibe et al. 2003] HOG detector [Dalal Triggs 2005] ▶ Max margin [Maji Berg 2009], Random ▶ DPM [Felzenszwalb et al. 2008] ▶ Forests [Gall Lempitsky 2009] Cascaded windows ▶ Classifiers & features AdaBoost [Viola Jones 2004] ▶ linear SVMs, kernel SVM, Fisher ▶ MKL [Vedaldi et al. 2009] ▶ Vectors, … [Cinbis et al. 2013, …] B and Bound [Lampert et al. 2009] ▶ convolutional neural networks ▶ Jumping windows [Sermanet et al. 2014, Girshick et al. ▶ 2014, …] [Sivic et al. 2008] ▶ HOG, SIFT, C-SIFT, … ▶ Selective windows ▶ [van de Sande et al. 2010, …] [Endres and Hoeim 2010, Uijlings ▶ Segmentation cues, … [Shotton et al. ▶ et. al 2011, Alexe et al. 2012, Gu et 2008, Cinbis et al. 2013, …] al. 2012]

Evolution of object detection 7 PASCAL VOC 2007 data RCNN-VGG 70 [Girshick et al.] 60 RCNN-Alex 50 [Girshick et al.] mAP [%] 40 DPM Regionlet [Wang [Felzenszwalb et et al.] al.] 30 MKL [Vedaldi et DPMv5 [Girshick 20 al.] et al.] 10 0 2008 2009 2010 2011 2012 2013 2014 2015 Year

R-CNN 8 [Girshick et al. 2013] Pros : simple and effective chair CNN background CNN potted CNN plant Cons : slow as the CNN is re-evaluated for each tested region f 8 c 5 c 1 c 2 c 3 c 4 f 6 f 7 label (SVM)

SPP R-CNN 9 [He et al. 2014] f 8 chair f 6 f 7 (SVM) f 8 c 5 c 1 c 2 c 3 c 4 bowl f 6 f 7 (SVM) potted f 8 f 6 f 7 plant (SVM) local features pooling encoder Convolutional features = local features Region descriptor = pooled local features Spatial pyramid + max pooling [He et al. 2014] ▶ Bag of words, Fisher vector, VLAD, …. [Cimpoi et. al. 2015] ▶ Order of magnitudes speedup

Computational cost 10 Detection time Avg. Time per Image [ms] R-CNN SPP-CNN 0 2000 4000 6000 8000 10000 12000 Sel. Search CNN evaluation SPP-CNN results in a significant test-time speedup However, region proposal extraction is the new bottleneck R-CNN minus R : can we get rid of region proposal extraction?

Streamlining R-CNN and SPP-CNN Dropping proposal generation

A complex learning pipeline 13 (SPP) R-CNN training comprises many steps label (fine tuning) c 5 f 6 f 7 c 1 c 2 c 3 c 4 f 8 label (ranking) SVM linear b. box regress. 1. Pre-train a large CNN (on ImageNet) 2. Extract region proposals (on PASCAL VOC) 3. Use pre-processed regions to: 1. Fine-tune the CNN 2. Learn an SVM to rank regions 3. Learn a bounding-box regressor to refine localization

A complex learning pipeline 14 (SPP) R-CNN training comprises many steps label c 5 f 6 f 7 c 1 c 2 c 3 c 4 f 8 label SVM linear b. box regress. frozen With SPP R-CNN of [He et al. 2014] fine-tuning is limited to the fully connected layers

Streamlining R-CNN 15 Removing the SVM phase score(s) learning loss mAP 𝑇 𝑑 0 𝑇 𝑑 = exp( 𝑥 𝑑 , 𝜚 𝒚 + 𝑐 𝑑 ) − log fine tuning 38.1 𝑇 0 + 𝑇 1 + 𝑇 2 + … + 𝑇 𝐷 𝑅 1 = 𝑥 1 , 𝜚 𝒚 + 𝑐 1 max 0, 1 − 𝑧 𝑅 1 ⋮ ⋮ region ranking 59.8 𝑅 𝐷 = 𝑥 𝐷 , 𝜚 𝒚 + 𝑐 𝐷 max{0, 1 − 𝑧 𝑅 𝐷 } 𝑅 𝑑 = log 𝑇 𝑑 region raking from fine-tuning 58.4 𝑇 0 Up to a simple transformation, softmax is just as good as hinge loss for box ranking.

Streamlining R-CNN and SPP-CNN 16 See also [Fast R-CNN and Faster R-CNN] label c 5 f 6 f 7 c 1 c 2 c 3 c 4 f 8 lin. b. box regress. frozen label c 5 c 1 c 2 c 3 c 4 f 6 f 7 f 8 SPP b. box f reg SPP and bounding box regressions can be easily implemented in a CNN (with a DAG topology) and trained jointly in one step

Streamlining R-CNN and SPP-CNN Dropping proposal generation

A constant-time region proposal generator 18 Algorithm Preprocessing Collect all the training bounding boxes (x 1 ,y 1 ,x 2 ,y 2 ) Use K-means to extract K clusters in (x 1 ,y 1 ,x 2 ,y 2 ) space Proposal generation Regardless of the image, return the same K cluster centers Proposals are now very fast but very inaccurate We let the CNN compensate with the bounding box regressor

Proposal statistics on PASCAL VOC 19 selective search sliding windows clustering ground truth 2K 7K 3K

Information pathways 20 [See also Lenc Vedaldi CPVR 2015] invariant equivariant representation representation what path label c 5 f 6 f 7 c 1 c 2 c 3 c 4 f 8 shared local features linear bounding box regress. where path

CNN-based bounding box regression 21 Dashed line : proposals Solid line : corrected by the CNN

Performance 22 0.6 0.58 mAP (VOC07) 0.56 0.54 0.52 0.5 0.48 0.46 0.44 0.42 Sel. Search (2K Slid. Win. (7K Clusters (2K Clusters (7K boxes) Boxes) Boxes) Boxes) Baseline BBR Observations Selective search is much better than fixed generators ▶ However, bounding box regression almost eliminates the difference ▶ Clustering allows to use significantly less boxes than sliding windows ▶

Timings 23 Finding (1) Streamlining accelerates SPP Avg. Time per Image [ms] Streamlined SPP SPP 0 50 100 150 200 250 300 350 400 450 GPU↔CPU Im. Prep. CONV Layers Spat. Pooling FC Layers Bbox Regr.

Timings 24 Finding (2) Dropping selective search is a huge benefit Avg. Time per Image [ms] Minus R Streamlined SPP SPP 0 500 1000 1500 2000 2500 3000 GPU↔CPU Sel. Search Im. Prep. CONV Layers Spat. Pooling FC Layers Bbox Regr.

Timings 25 Finding (2) Dropping selective search is a huge benefit Avg. Time per Image [ms] Minus R Streamlined SPP SPP RCNN 0 2000 4000 6000 8000 10000 12000 GPU↔CPU Sel. Search Im. Prep. CONV Layers Spat. Pooling FC Layers Bbox Regr.

Timings 26 Test-time speedups Times faster than R-CNN 67.5 Minus R Streamlined 5.0 SPP SPP 4.5 1.0 RCNN 0 10 20 30 40 50 60 70 80

Conclusions 27 Current CNNs can localize objects well External segmentation cues bring only a minor benefit at a great expense ▶ Benefits of CNN-only solutions Much faster, particularly at test time ▶ Much simpler and streamlined implementations ▶ Future steps Eliminate the remaining accuracy gap ▶ Essentially achieved in ▶ [Faster R-CNN, Ren et al. 2015] Beyond bounding boxes ▶ Beyond detection ▶

R-CNN minus R Karel Lenc, Andrea Vedaldi Object detection 2 Goal : - PowerPoint PPT Presentation

Visual Geometry Group, Department of Engineering Science R-CNN minus R Karel Lenc, Andrea Vedaldi Object detection 2 Goal : tightly enclose objects of a certain type in a bounding box bikes planes horses birds Top

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

Fibonacci Heap Group Minus One Second December 6, 2016 Group Minus One Second Fibonacci Heap

The projective line minus three fractional 3 kinds of integral points points Darmons M

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN Gang Yu

Dynamic Graph CNN for learning on point clouds Wang Yue, et al. Otakar Jaek March 25, 2019

A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task Danqi Chen, Jason Bolton

Intro to Feature Representation in Virtual Screening Shengchao Liu, Gitter Group Feature

Lecture 19: Generative Models, Part 1 Justin Johnson November 20, 2019 Lecture 19 - 1 Last

Priming for a Regression CNN for Energy and Vertex of Electrons Ben Jargowsky University of

Image Segmentation with Gated Shape CNN for Autonomous Driving Jeanine Liebold Intelligent

Mark Carlson Hui Shan Missaka Warusawitharana Motivation Concerns about the impact of bank

When to invest in high speed rail British experience Chris Nash Research Professor

Security Regression Addressing Security Regression by Unit Testing Christopher Grayson

(U) A Method for Regression Analysis on Sparse Datasets Daniel Barkmeyer NRO CAAG June 2015

WhenDeepLearningmeets VisualLocalization Mar MartinHu Humenberger NA

Relationship Between Commodities and Currency Pairs Derrick Hang Econ 201FS April 14, 2010

Monetary Policy and the Uncovered Interest Rate Parity Puzzle Dave Backus, Federico Gavazzoni,

Train ined to Kil ill: Battlefield Part rticipation in in Kurdish Fig ighters Matthew