SPP-net Spatial Pyramid Pooling in Deep Convolutional Networks Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Asia Visual Computing Group
Highlights • ILSVRC 2014 (all provided-data tracks) • DET - 2 nd • CLS - 3 rd • LOC - 5 th • ECCV 2014 paper • Published 2 months ago (arXiv: 1406.4729v1 , June 18) • Details disclosed (arXiv: 1406.4729v2 ) “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
Overview • SPP-net - a new network structure • Classification - improves all CNNs • Detection - 20-60x faster than R-CNN, as accurate “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
Spatial Pyramid Matching • SPM: very successful in traditional computer vision [Grauman & Darrell, ICCV 2005] “ The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features ” [Lazebnik et al , CVPR 2006 ] “ Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories ” prediction dense SIFT encoded SPM SVM (VQ, SC, FV) CNN “conv layers” simply pooling? “fc layers” counterparts “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
SPP-net: SPM in CNN traditional 1000 CNN 4096 4096 fixed size conv fc SPP-net 1000 4096 4096 spatial pyramid any size pooling • Fix bin numbers • DO NOT fix bin size “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
SPP-net fc layers • variable input size/scale • multi-size training concatenate • multi-scale testing … ... • full-image view … ... • multi-level pooling • robust to deformation • operates on feature maps • pooling in regions spatial pyramid pooling layer conv feature maps conv layers input image “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
ILSVRC top-5 val (10-view) no-SPP baselines 15.00 14.76 14.50 13.92 14.14 14.00 13.54 13.52 multi-level pooling 13.50 13.64 13.33 13.00 12.80 12.50 12.33 12.00 11.97 + multi-size training 11.50 All CNNs 11.12 11.00 improved! 10.95 10.50 10.00 ZF-5 Convnet*-5 Overfeat-5 Overfeat-7 4 architectures “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
ILSVRC 2014 CLS Results team top-5 test GoogLeNet 6.66 7-conv SPP-net, 10-view 10.95% Oxford VGG 7.32 ours 8.06 7-conv SPP-net, 96-view+2-full 9.08% 7-conv SPP-net, multi-scale/view 9.08% Howard 8.11 DeeperVision 9.50 multiple SPP-nets 8.06% NUS-BST 9.79 TTIC_ECP 10.22 … • “shallow” • 7-conv, 1 Titan GPU, 3 weeks • but potential • SPP can improve deeper nets: >1% gain post-competition “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
Detection: SPP on Regions fc layers … ... SPP conv feature maps region conv layers input image “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
RCNN vs. SPP • image regions vs . feature map regions feature feature feature feature feature feature feature net net net net net image image R-CNN SPP-net 2000 nets on image regions 1 net on full image “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
• With regional features, we can do everything of RCNN • fine- tune, SVM, bbox regression… • similar accuracy, much faster SPP-net SPP-net RCNN 1-scale 5-scale mAP 58.0 59.2 58.5 GPU time / img 0.14s 0.38s 9s speed-up 64x 24x - VOC 2007 “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
ILSVRC 2014 DET Results mAP NUS 37.2 ours, multi SPP-nets 35.1 SPP-net RCNN UvA 32.0 GPU time / img 0.6s 32s ours, 1 SPP-net 31.8 40k test imgs 8 hours 15 days Southeast-CASIA 30.4 1-HKUST 28.8 cost of a single model CASIA_CRIPAC_2 28.6 “provided data” track “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
• Conclusion • SPM in CNNs • CLS: improve all CNNs in the literature • DET: practical, fast, and accurate • Future work • SPP on advanced networks • Resources • code, config, tech report… http://research.microsoft.com/en-us/um/people/kahe/ • Acknowledgement • We thank NVIDIA for the GPU donation. “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
Recommend
More recommend