fast r cnn
play

Fast%R'CNN Ross$Girshick Facebook$AI$Research$(FAIR) - PowerPoint PPT Presentation

Reproducible$research$ get$the$code! http://git.io/vBqm5 Fast%R'CNN Ross$Girshick Facebook$AI$Research$(FAIR) Work$done$at$Microsoft$Research Fast%Region'based%ConvNets (R'CNNs)% for%Object%Detection Localization Wh Where? person :


  1. Reproducible$research$– get$the$code! http://git.io/vBqm5 Fast%R'CNN Ross$Girshick Facebook$AI$Research$(FAIR) Work$done$at$Microsoft$Research

  2. Fast%Region'based%ConvNets (R'CNNs)% for%Object%Detection Localization Wh Where? person : 0.992 horse : 0.993 Recognition car : 1.000 Wh What? person : 0.979 dog : 0.997 Figure%adapted%from%Kaiming He

  3. Object%detection%renaissance% (2013'present) 80% PASCAL$VOC 70% mean0Average0Precision0(mAP) 60% Before$deep$convnets 50% 40% Using$deep$convnets 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

  4. Object%detection%renaissance% (2013'present) 80% PASCAL$VOC 70% mean0Average0Precision0(mAP) 60% Before$deep$convnets RHCNNv1 50% 40% Using$deep$convnets 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

  5. Object%detection%renaissance% (2013'present) 80% PASCAL$VOC Fast$RHCNN 70% mean0Average0Precision0(mAP) +$Accurate 60% RHCNNv1 +$Fast 50% +$Streamlined +$Accurate H Slow 40% H Inelegant 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

  6. Region'based%convnets (R'CNNs) • RHCNN$(aka$“slow$RHCNN”)$ [Girshick et$al.$CVPR14] • SPPHnet$ [He$et$al.$ECCV14]

  7. Slow%R'CNN Input$image Girshick et$al.$CVPR14.

  8. Slow%R'CNN Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Girshick et$al.$CVPR14.

  9. Slow%R'CNN Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Girshick et$al.$CVPR14.

  10. Slow%R'CNN Forward$each$region$ through$ ConvNet ConvNet ConvNet ConvNet Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Girshick et$al.$CVPR14.

  11. Slow%R'CNN Classify$regions$with$SVMs SVMs SVMs SVMs Forward$each$region$ through$ ConvNet ConvNet ConvNet ConvNet Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Post$hoc$component Girshick et$al.$CVPR14.

  12. Slow%R'CNN Apply$boundingHbox$ regressors Classify$regions$with$SVMs Bbox reg SVMs Bbox reg SVMs Bbox reg SVMs Forward$each$region$ through$ ConvNet ConvNet ConvNet ConvNet Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Post$hoc$component Girshick et$al.$CVPR14.

  13. What’s%wrong%with%slow%R'CNN?

  14. What’s%wrong%with%slow%R'CNN? • Ad$hoc$training$objectives • FineHtune$network$with$softmax classifier$(log$loss) • Train$postHhoc$linear$SVMs$(hinge$loss) • Train$postHhoc$boundingHbox$regressors (squared$loss)

  15. What’s%wrong%with%slow%R'CNN? • Ad$hoc$training$objectives • FineHtune$network$with$softmax classifier$(log$loss) • Train$postHhoc$linear$SVMs$(hinge$loss) • Train$postHhoc$boundingHbox$regressors (squared$loss) • Training$is$slow$(84h),$takes$a$lot$of$disk$space

  16. What’s%wrong%with%slow%R'CNN? • Ad$hoc$training$objectives • FineHtune$network$with$softmax classifier$(log$loss) • Train$postHhoc$linear$SVMs$(hinge$loss) • Train$postHhoc$boundingHbox$regressions$(least$squares) • Training$is$slow$(84h),$takes$a$lot$of$disk$space • Inference$(detection)$is$slow • 47s$/$image$with$VGG16$[Simonyan &$Zisserman.$ICLR15] • Fixed$by$SPPHnet$[He$et$al.$ECCV14] ~2000$ConvNet forward$passes$per$image

  17. SPP'net Input$image He$et$al.$ECCV14.

  18. SPP'net “conv5”$feature$map$of$image Forward$ whole& image$through$ConvNet ConvNet Input$image He$et$al.$ECCV14.

  19. SPP'net Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image He$et$al.$ECCV14.

  20. SPP'net Spatial$Pyramid$Pooling$ (SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image He$et$al.$ECCV14.

  21. SPP'net Classify$regions$with$SVMs SVMs FullyHconnected$layers FCs Spatial$Pyramid$Pooling$ (SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image Post$hoc$component He$et$al.$ECCV14.

  22. SPP'net Apply$boundingHbox$ regressors Classify$regions$with$SVMs Bbox reg SVMs FullyHconnected$layers FCs Spatial$Pyramid$Pooling$ (SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image Post$hoc$component He$et$al.$ECCV14.

  23. What’s%good%about%SPP'net? • Fixes$one$issue$with$RHCNN:$makes$testing$fast Bbox reg SVMs RegionHwise FCs computation ImageHwise computation (shared) ConvNet Post$hoc$component

  24. What’s%wrong%with%SPP'net? • Inherits$the$rest$of$RHCNN’s$problems • Ad$hoc$training$objectives • Training$is$slow$(25h),$takes$a$lot$of$disk$space

  25. What’s%wrong%with%SPP'net? • Inherits$the$rest$of$RHCNN’s$problems • Ad$hoc$training$objectives • Training$is$slow$(though$faster),$takes$a$lot$of$disk$space • Introduces$a$new$problem:$cannot$update$ parameters$below$SPP$layer$during$training

  26. SPP'net:%the%main%limitation Bbox reg SVMs Trainable (3$layers) FCs Frozen ConvNet (13$layers) Post$hoc$component He$et$al.$ECCV14.

  27. Fast%R'CNN • Fast$testHtime,$like$SPPHnet

  28. Fast%R'CNN • Fast$testHtime,$like$SPPHnet • One$network,$trained$in$one$stage

  29. Fast%R'CNN • Fast$testHtime,$like$SPPHnet • One$network,$trained$in$one$stage • Higher$mean$average$precision$than$slow$RHCNN$ and$SPPHnet

  30. Fast%R'CNN%(test%time) Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

  31. Fast%R'CNN%(test%time) “RoI Pooling”$(singleHlevel$SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

  32. Fast%R'CNN%(test%time) Linear$+ Softmax classifier softmax FullyHconnected$layers FCs “RoI Pooling”$(singleHlevel$SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

  33. Fast%R'CNN%(test%time) Linear$+ Softmax classifier BoundingHbox$ regressors Linear softmax FullyHconnected$layers FCs “RoI Pooling”$(singleHlevel$SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

  34. Fast%R'CNN (training) Linear$+ Linear softmax FCs ConvNet

  35. Fast%R'CNN (training) Log$loss$+$smooth$L1$loss MultiHtask$loss Linear$+ Linear softmax FCs ConvNet

  36. Fast%R'CNN (training) Log$loss$+$smooth$L1$loss MultiHtask$loss Linear$+ Linear softmax FCs Trainable ConvNet

  37. Obstacle%#1:%Differentiable%RoI pooling Region%of%Interest%(RoI)%pooling%must%be%(sub')% differentiable%to%train%conv layers

  38. Obstacle%#1:%Differentiable%RoI pooling ) 8 RoI pooling ! ∗ 0,2 = 23 5 8,: ! ∗ 1,0 = 23 5 9,8 ) 0 :; 9 ) 8 RoI pooling ) 9 max%pooling%“switch”% 1$if$ ), * “pooled” ( i.e. argmax back'pointer) input$ ! ;$0$o/w 23 23 = 4 4 ! = ! ∗ ), * 20 1 25 67 6 7 P artial Over$regions$ ) , Partial$from for$ 0 1 locations$ * next$layer

  39. Obstacle%#2:%efficient%SGD%steps Slow%R'CNN%and%SPP'net%use%region'wise%sampling%to% make%mini'batches • Sample%128%example%RoIs uniformly%at%random • Examples%will%come%from%different%images%with%high% probability ...$ ...$ ...$ ...$ SGD$miniHbatch

  40. Obstacle%#2:%efficient%SGD%steps Note%the%receptive%field%for%one%example%RoI is%often% very%large • Worst%case:%the%receptive%field%is%the%entire%image Example$RoI Example$RoI RoI’sreceptive$field

  41. Obstacle%#2:%efficient%SGD%steps Worst%case%cost%per%mini'batch%(crude%model%of% computational%complexity) input%size%for%Fast%R'CNN input%size%for%slow%R'CNN 128*600*1000%/%(128*224%*224)%=%12x%more% computation%than%slow%R'CNN Example$RoI Example$RoI RoI’sreceptive$field

  42. Obstacle%#2:%efficient%SGD%steps Solution:%use%hierarchical%sampling%to%build%mini' batches ...$ ...$ ...$ ...$

  43. Obstacle%#2:%efficient%SGD%steps Solution:%use%hierarchical%sampling%to%build%mini' batches • Sample%a%small% number%of%images% ...$ ...$ ...$ ...$ (2) Sample$images

  44. Obstacle%#2:%efficient%SGD%steps Solution:%use%hierarchical%sampling%to%build%mini' batches • Sample%a%small% number%of%images% ...$ ...$ ...$ ...$ (2) Sample$images • Sample%many% examples%from% each%image%(64)% SGD$miniHbatch

  45. Obstacle%#2:%efficient%SGD%steps Use%the%test'time%trick%from%SPP'net%during%training • Share%computation%between%overlapping%examples% from%the%same%image Example$RoI Example$RoI 1 1 Example$RoI Example$RoI 2 2 Example$RoI 3 Example$RoI 3 Union$of$RoIs’ receptive$fields (shared$computation)

  46. Obstacle%#2:%efficient%SGD%steps Cost%per%mini'batch%compared%to%slow%R'CNN%(same% crude%cost%model) input%size%for%Fast%R'CNN input%size%for%slow%R'CNN • 2*600*1000%/%(128*224*224)%=%0.19x%less% computation%than%slow%R'CNN Example$RoI Example$RoI 1 1 Example$RoI Example$RoI 2 2 Example$RoI 3 Example$RoI 3 Union$of$RoIs’ receptive$fields (shared$computation)

Recommend


More recommend