You Only Look Once: Unified, Real-Time Object Detection Redmon et al., CVPR 2016 Mincheul Kang 1
Image Retrieval using Scene Graphs • Develop novel framework for semantic image retrieval based on the notion of a scene graph • Use scene graphs as query • Introduce a novel dataset of 5K human-generated scene graphs grounded to images Measure Score Query Output Object & Attribute Relationship 2
Contents 1. Background 2. Related work 3. Overview 4. Approach 5. Results 6. Conclusion 7. Q&A 3
Background • Object detection Localization Where? Recognition What? 4 Fast R-CNN slides : Ross Girshick
Background • Object detection in application • Image retrieval • Robotics • Self-driving car Need a fast and accurate algorithms 5 http://www.nvidia.com/object/drive-px.html http://kitschthingoftheday.blogspot.com/2011/06/breakfast-making-robots-at-tum.html
Background • Progress of object detection After CNN PASCAL VOC 80% mean Average Precision (mAP) Faster R-CNN 70% Fast R-CNN 60% R-CNN 50% 40% DPM 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Machine learning + Computer vision 6
Related work • R-CNN (Region proposals + CNN) • Selective search • CNN that extracts a fixed-length feature vector from each region • Binary linear SVMs Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, 7 Ross Girshick et al., CVPR 2014
Related work • Problem in R-CNN • Progress in several stages • Training and detection time is slow • Need a high capacity storage space 8
Related work • Fast R-CNN • Training is single-stage, using a multi-task loss • Training can update all network layers • No disk storage is required for feature caching Fast R-CNN, 9 Ross Girshick et al., ICCV 2015
Related work • Faster R-CNN • “selective search” => Computing time is long • Region Proposal Network Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, 10 Shaoqing Ren et al., NIPS 2015 and Slides
Related work • Summary • Improve the speed and mAP after CNN • But, It is not enough to operate real-time yet • YOLO • Enable real-time speeds while maintain high average precision 11
Overview • YOLO detection system 1) Resizes the input images to 448 X 448 2) Runs a single convolutional networks on the image 3) Thresholds the resulting detections by the model’s confidence You only look once: Unified, real-time object detection, 12 J Redmon et al., CVPR 2016
Approach • Divide the input image into an S X S grid Input image You only look once: Unified, real-time object detection, 13 J Redmon et al., CVPR 2016
Approach • Each grid cell predicts bounding boxes and confidence scores for those boxes. • IOU (intersection over union) • Confidence : You only look once: Unified, real-time object detection, 14 J Redmon et al., CVPR 2016
Approach • Each grid cell also predicts conditional class probabilities • Class probability : You only look once: Unified, real-time object detection, 15 J Redmon et al., CVPR 2016
Approach • Thresholds the resulting detections by the model’s confidence You only look once: Unified, real-time object detection, 16 J Redmon et al., CVPR 2016
Approach • YOLO • Enables end-to-end training and real-time speeds • Predict all bounding boxes across all classes for an image simultaneously You only look once: Unified, real-time object detection, 17 J Redmon et al., CVPR 2016
Approach • Training • Cost function : You only look once: Unified, real-time object detection, 18 J Redmon et al., CVPR 2016
Result • Result in sample artwork and natural images from internet You only look once: Unified, real-time object detection, 19 J Redmon et al., CVPR 2016
Result • Real-time speeds while maintaining high average precision 69.0 You only look once: Unified, real-time object detection, 20 J Redmon et al., CVPR 2016
Conclusion • Using a single network, it can be optimized end-to-end directly on detection • Predict all bounding boxes across all classes for an image simultaneously • Real-time speeds while maintaining high average precision • Limitations • Struggle with small objects that appear in groups, such as flocks of birds • Incorrect localizations 21
Q & A 22
Recommend
More recommend