CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN
Today’s Class • Object Detection • The RCNN Object Detector (2014) • The Fast RCNN Object Detector (2015) • The Faster RCNN Object Detector (2016) • YOLO (CVPR 2016) • SSD (ECCV 2016)
Object Detection deer cat
Object Detection Class Scores Deer: 0.9 Cat: 0.05 Fully Connected : Umbrella: 0.01 4096 to k … Fully Connected: 4096 to 4 Box Coordinates (x, y, w, h)
Object Detection Deer: (x, y, w, h) 4096 Cat: (x, y, w, h)
Object Detection Penguin: (x, y, w, h) Penguin: (x, y, w, h) 4096 Penguin: (x, y, w, h) Penguin: (x, y, w, h) …
Object Detection as Classification deer? CNN cat? background?
Object Detection as Classification deer? CNN cat? background?
Object Detection as Classification deer? CNN cat? background?
Object Detection as Classification with Sliding Window deer? CNN cat? background?
Object Detection as Classification with Box Proposals
RCNN https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf Rich feature hierarchies for accurate object detection and semantic segmentation. Girshick et al. CVPR 2014.
RCNN First stage: generate category- independent region proposals. • 2000 Region proposals for every image Selective Search: combine the strength of both an exhaustive search and segmentation. Uijlings et al. IJCV 2013. ref
RCNN First stage: generate category- independent region proposals. • 2000 Region proposals for every image Second stage: extracts a fixed-length feature vector from each region. a 4096-dimensional feature vector • from each region proposal feature vector warp CNN Arbitrary rectangles? 5 conv layers + 2 fully A fixed size input? 227 x 227 connected layers
RCNN First stage: generate category- independent region proposals. • 2000 Region proposals for every image Second stage: extracts a fixed-length feature vector from each region. a 4096-dimensional feature vector • people? from each region proposal feature vector linear horse? svm Third stage: a set of class- specific background? linear SVMs. x object category and location • Bounding box y regression w h proposal location
Fast-RCNN RCNN Simple and scalable. • improves mAP. • • A multistage pipeline. Training is expensive in • ? space and time (features are extracted from each region proposal in each image and written into disk). Object detection is slow. •
Fast-RCNN Idea: No need to recompute features for every box independently https://arxiv.org/abs/1504.08083 Fast R-CNN. Girshick. ICCV 2015.
Fast-RCNN Process the whole image with several convolutional ( conv ) and max pooling a region of interest ( RoI ) pooling layers to produce a conv feature map. layer extracts a fixed-length feature vector from the region feature map. FC+ K + 1 categories softmax feature vector + four real-valued FC+ numbers for each of regressor the K object classes. …
RCNN vs Fast-RCNN Figure adapted from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Fast-RCNN RCNN Faster-RCNN Simple and scalable. Higher mAP. • • improves mAP. Single stage, end-to-end • • training. No disk storage is required • • A multistage pipeline. for feature caching. Training is expensive in • ? space and time (features proposals are the • are extracted from each computational bottleneck region proposal in each in detection systems. image and written into disk). Object detection is slow. •
Faster-RCNN Idea: Integrate the Bounding Box Pro posals as part of the CNN predictions https://arxiv.org/abs/1506.01497 Ren et al. NIPS 2015.
Faster-RCNN Region Proposal Networks: k anchors boxes 2 k scores 4 k coordinates object or not object bounding box proposal RPN 1x1 conv layer 1x1 conv layer cls layer reg layer Shared conv layers nxn conv layer Fast-RCNN feature map … sliding window, nxn
RCNN vs Fast-RCNN Figure adapted from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Fast-RCNN RCNN Faster-RCNN Simple and scalable. Higher mAP. compute proposals with a • • • improves mAP. Single stage, end-to-end deep convolutional neural • • training. network -- Region Proposal No disk storage is required Network (RPN) • • A multistage pipeline. for feature caching. merge RPN and Fast R-CNN • Training is expensive in • into a single network, space and time (features enabling nearly cost-free proposals are the • are extracted from each region proposals. computational bottleneck region proposal in each in detection systems. image and written into ? disk). Object detection is slow. •
YOLO- You Only Look Once Idea: No bounding box proposal. A single regression problem, stra ight from image pixels to boundi ng box coordinates and class pro babilities. extremely fast • reason globally • learn generalizable represent • ations https://arxiv.org/abs/1506.02640 Redmon et al. CVPR 2016.
YOLO- You Only Look Once Divide the image into 7x7 cells. Each cell trains a detector. The detector needs to predict the object’s class distributions. The detector has 2 bounding-box predictors to predict bounding-boxes and confidence scores.
SSD: Single Shot Detector Idea: Similar to YOLO, but denser grid map, multiscale grid maps. + Data augme ntation + Hard negative mining + Other design choices in the network. Liu et al. ECCV 2016.
Questions?
Recommend
More recommend