An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016
Deep Residual Learning •Microsoft Research Asia (MSRA) •Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. arXiv 2015. •Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. NIPS 2015. •ILSVRC & COCO 2015 competitions
MSRA @ ILSVRC & COCO 2015 Competitions • 1st places in all five main tracks • ImageNet Classification: “ Ultra-deep ” 152-layer nets • ImageNet Detection: 16% better than 2nd • ImageNet Localization: 27% better than 2nd • COCO Detection: 11% better than 2nd • COCO Segmentation: 12% better than 2nd *improvements are relative numbers Slide Credit: He et al. (MSRA)
Revolution of Depth 28.2 25.8 152 layers 16.4 11.7 22 layers 19 layers 7.3 6.7 3.57 8 layers 8 layers shallow ILSVRC'14 ILSVRC'15 ILSVRC'14 ILSVRC'13 ILSVRC'12 ILSVRC'11 ILSVRC'10 ResNet GoogleNet VGG AlexNet ImageNet Classification top-5 error (%) Slide Credit: He et al. (MSRA)
101 layers Revolution of Depth 86 Engines of visual 66 recognition 58 34 16 layers 8 layers shallow AlexNet VGG ResNet HOG, DPM (RCNN) (RCNN) (Faster RCNN)* PASCAL VOC 2007 Object Detection mAP (%) *w/ other improvements & more data Slide Credit: He et al. (MSRA)
Residual learning reformulates the learning procedure and redirects the information flow in deep neural networks.
Revolution of Depth AlexNet, 8 11x11 conv, 96, /4, pool/2 layers (ILSVRC 2012) 5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000 Slide Credit: He et al. (MSRA)
Revolution of Depth s o f t m a x 2 S oft max A c t i v a t i o n FC Av e r a ge Po o l 7 x 7 + 1 (V) AlexNet, 8 11x11 conv, 96, /4, pool/2 VGG, 19 GoogleNet, 22 3x3 conv, 64 D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 5x5 conv, 256, pool/2 3x3 conv, 64, pool/2 layers layers layers 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t 3x3 conv, 384 3x3 conv, 128 C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) (ILSVRC 2012) (ILSVRC (ILSVRC 2014) s o f t m a x 1 C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) S oft max A c t i v a t i o n 3x3 conv, 384 3x3 conv, 128, pool/2 Max P ool 3 x 3 + 2 (S ) FC 2014) D e pt h Co n c a t FC 3x3 conv, 256, pool/2 3x3 conv, 256 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l Av e r a g e P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 3 (V) fc, 4096 3x3 conv, 256 D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) fc, 4096 3x3 conv, 256 C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) D e pt h Co n c a t s o f t m a x 0 fc, 1000 3x3 conv, 256, pool/2 C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) S oft max A c t i v a t i o n C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) FC 3x3 conv, 512 D e p t h C o n c a t F C C o n v C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3x3 conv, 512 C o n v C o n v M a x P o o l Av e r a g e P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 3 (V) D e pt h Co n c a t 3x3 conv, 512 C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 3x3 conv, 512, pool/2 Max P ool 3 x 3 + 2 (S ) D e pt h Co n c a t 3x3 conv, 512 C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 3x3 conv, 512 D e pt h Co n c a t C o n v C o n v C o n v C o n v 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) 5 x 5 + 1 (S ) 1 x 1 + 1 (S ) 3x3 conv, 512 C o n v C o n v M a x P o o l 1 x 1 + 1 (S ) 1 x 1 + 1 (S ) 3 x 3 + 1 (S ) Max P ool 3 x 3 + 2 (S ) 3x3 conv, 512, pool/2 L o c a l R e s p N o r m C o n v 3 x 3 + 1 (S ) fc, 4096 C o n v 1 x 1 + 1 (V) L o c a l R e s p N o r m fc, 4096 Max P ool 3 x 3 + 2 (S ) C o n v 7 x 7 + 2 (S ) fc, 1000 Slide Credit: He et al. (MSRA) i n pu t
7x7 conv, 64, /2, pool /2 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 64 3x3 co nv , 64 1x1 co nv , 256 1x1 co nv , 128, /2 3x3 co nv , 128 Revolution of Depth 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 128 3x3 co nv , 128 1x1 co nv , 512 1x1 co nv , 256, /2 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 5x5 conv , 256, pool/2 3x3 conv , AlexNet, 8 VGG, 19 ResNet, 152 1x1 co nv , 1024 384 1x1 co nv , 256 3x3 conv , 64 3x3 co nv , 256 3x3 conv , 64, pool/2 3x3 conv , 3x3 conv , 384 3x3 conv , 256, 1x1 co nv , 1024 128 pool/2 1x1 co nv , 256 3x3 conv , 128, pool/2 fc, 4096 3x3 conv , 256 fc, 4096 3x3 conv , 256 fc, 1000 3x3 conv , 256 layers layers layers 3x3 co nv , 256 3x3 conv , 256, pool/2 3x3 conv , 1x1 co nv , 1024 512 1x1 co nv , 256 3x3 conv , 512 3x3 conv , 512 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 conv , 512, pool/2 3x3 conv , 512 3x3 conv , 512 3x3 co nv , 256 1x1 co nv , 1024 (ILSVRC 2012) (ILSVRC (ILSVRC 2015) 1x1 co nv , 256 3x3 conv , 512 3x3 conv , 512, pool/2 fc, 4096 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 fc, 4096 fc, 1000 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 2014) 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 256 3x3 co nv , 256 1x1 co nv , 1024 1x1 co nv , 512, /2 3x3 co nv , 512 1x1 co nv , 2048 1x1 co nv , 512 3x3 co nv , 512 1x1 co nv , 2048 1x1 co nv , 512 3x3 co nv , 512 1x1 co nv , 2048 ave pool, fc 1000 Slide Credit: He et al. (MSRA)
Recommend
More recommend