we weakly and deeply supervised vi visual learning
play

We Weakly and deeply supervised vi visual learning www . xinggangw - PowerPoint PPT Presentation

CSIG We Weakly and deeply supervised vi visual learning www . xinggangw . info 1 Annotation time of manual supervision 2 Annotation time: 1 2.4 10 78 second per instance Berman et


  1. 华中科技大学 王兴刚 CSIG青年科学家论坛 We Weakly and deeply supervised vi visual learning www . xinggangw . info 1

  2. Annotation time of manual supervision 2 Annotation time: 1 2.4 10 78 second per instance Berman et al., What’s the Point: Semantic Segmentation with Point Supervision, ECCV 16 Slide credit: Hakan Bilen

  3. Image labels 3 Person, Horse • Supervision: image (category) labels • Target: Object detection, semantic segmentation etc [Verbeek CVPR 07, Pendey, ICCV 11, Cinbis CVPR 14, Wang ECCV 14, Papandreou ICCV 15, BelienCVPR 15, Tang CVPR 17, Wei CVPR 17, Singh ICCV 17, Huang CVPR 18 etc.]

  4. Video labels 4 • Supervision: video (category) labels • Target: Object detection, semantic segmentation etc [Papazoglou, ICCV 13] [Tokmakov ECCV 16]

  5. Clicks in object 5 [BearmanECCV 16] • Supervision: one point each instance/category • Target: Object detection, semantic segmentation etc.

  6. Extreme points 6 DEXTR [Maninis CVPR 18] • Supervision: object extreme points • Target: instance segmentation [Padadopoulos ICCV 17, Maninis CVPR 18]

  7. Scribbles in object 7 MIL Cut [Wu CVPR 14] [BearmanECCV 16] • Supervision: scribbles/lines per instance • Target: instance segmentation

  8. Object bbox 8 BoxSup [Dai ICCV 15] • Supervision: object bounding boxes • Target: instance segmentation [Rother SIGGRAPH 04, Dai ICCV 15, Khoreva CVPR 17]

  9. Webly supervision 9 [Hou et al, arxiv 18] • Supervision: Keywords & search engines • Target: semantic segmentation

  10. Hashtag 10 [Mahajan, ECCV 18] • Supervision: 3.5 billion images with Instagram tags • Target: a good pre-trained model

  11. Mixing full & weak supervision 11 • Supervision: COCO (has bbox) + ImageNet (has image label) • Target: object detection for 9000 classes Blue: COCO class. Dark: ImageNet class YOLO9000, CVPR 2017 Best Paper Honorable Mention [Redmon CVPR 17]

  12. Full + weak supervision + Domain adaptation 12 [Inoue, CVPR 18] • Supervision: bbox in source domain + image label in target domain • Target: bbox in target domain

  13. Count of object 13 • Supervision: counts of object per class • Target: object detection C-WSL [Wang, ECCV 18]

  14. Only number of classes 14 • Supervision: only number of classes • Target: object bbox bMCL [Zhu, CVPR 12, PAMI 15]

  15. Polygon-RNN 15 Polygon-RNN, Honorable Mention Best Paper Award [Castrejon, CVPR 17] Polygon-RNN cuts down the number of required annotation clicks by a factor of 4.74 • Supervision: bbox + interactive key point • Target: object polygon

  16. From the perspective of machine learning 16 WSL [Zhou, 2018, National science review]: • Incomplete supervision • Inaccurate supervision • Inexact supervision Full supervision Person, Dog Incomplete supervision Inaccurate supervision Inexact supervision

  17. Next 17 Person, Horse • Weakly supervised • Weakly supervised semantic segmentation object detection

  18. Standard MIL pipeline 18 1. Window space (usually, using object proposals) 2. Initialization 3. Re-localization & Re-training [Chum CVPR 07, Deselaers ECCV 10, Siva ICCV 11, Wang ICCV15, BilenCVPR 15] Slide credit: VittoFerrari

  19. Weakly-supervised deep detection network (WSDDN) 19 [BilenCVPR 16] J End-to-end Region CNN for WSOD L Normalization over classes hurts performance Slide credit: Hakan Bilen

  20. Online instance classifier refinement (OICR) network 20 [Tang CVPR 17] • Additional blocks (instance classifiers) for score propagation • In-network supervision J The positive proposals in one image are not sharing score J Performance significantly improves L The instance-level in-network supervision may not be correct

  21. Proposal cluster learning 21 [Tang, arXiv:1807.03342v1, under revision of TPAMI] J In-network supervision for proposal cluster is more robust J MIL in MIL network (Bag in bag MIL) L It still relies on hand-crafted object proposals WSDDN OICR PCL

  22. Weakly supervised region proposal network 22 [Tang ECCV 18] J Generating object proposals from neural activations J Confirming that CNN contains rich localization information even under weak supervision J The first weakly supervised region proposal network (wsRPN)

  23. Generative adversarial learning 23 [Shen CVPR 18] J Training SSD by WSOD using GAN loss J Fast inference speed using SSD J Accurate WSOD by adversarial learning

  24. Performance 24 FASTER RCNN (PAMI17) 69.9 WSRPN (ECCV18) 50.4 GAL-FWSD512 (CVPR18) 47.5 OCIR (CVPR17) 47 HCP+ (CVPR17) 43.7 WCCN (CVPR17) 42.8 WSDDN (CVPR16) 39.3 WSOD performance (mAP on PASCAL VOC 2007 test)

  25. Class activation maps 25 [Zhou CVPR 16] J Finding discriminative regions by Global Average Pooling in a CNN trained using image labels J A very insightful work for understanding CNN

  26. Adversarial erasing network 26 [Wei CVPR 17] J Adversarial erasing finds dense and complete object regions J Very impressive WSSS results

  27. Seed, Expand and Constrain (SEC) 27 [Kolesnikov ECCV 16] J Seed with weak localization cues J Expand with image labels J Constrain to object boundary using CRF

  28. Deep seeded region grown (DSRG) network 28 [Huang CVPR 18] seeded region growing Seed Classification network Seed Seeding Segmentation Network Loss Boundary Downscale Loss CRF J Region growing for complete and dense object regions J A segmentation network generates new pixel labels by itself

  29. Iteratively Mining Common Object Features 29 [Wang CVPR 18] J Mining common features between region(super-pixel)-level classification network and pixel-level segmentation network

  30. Performance 30 FCN (RESNET101) 70 DSRG (CVPR18) 63.2 MCOF (CVPR18) 60.3 DCSP (BMVC17) 59.2 AE-PSL (CVPR17) 55.7 AF-SS (ECCV16) 52.7 SEC (ECCV16) 51.7 STC (PAMI16) 51.2 DCSM (ECCV16) 45.1 EM-ADAPT (ICCV15) 39.6 CCNN (CVPR15) 35.6 MIL-FCN (CVPRW14) 24.9 0 10 20 30 40 50 60 70 80 WSSS performance (mIoU on PASCAL VOC 2012 test)

  31. Take always 31 • There are many different kinds of weak supervision for visual recognition. 1 • WSVL significantly reduces human 2 labeling efforts. • Deep learning enables effective WSVL; 3 however, performance still far from full supervised models. 4 • WSVL is a rising research area; there are lots of interesting ideas to explorer.

  32. Resources 32 • CVPR tutorial: Weakly supervised learning for computer vision , by Hakan Belen, Rodrigo Benenson, Jasper Uijlings, https://hbilen.github.io/wsl-cvpr18.github.io • Source codes • WSSDN: https://github.com/hbilen/WSDDN • CAM: http://cnnlocalization.csail.mit.edu • OICR/PCL: https://github.com/ppengtang/oicr/tree/pcl • SEC: https://github.com/kolesman/SEC • DSRG: https://github.com/speedinghzl/DSRG

  33. Questions? 33 Thanks a lot your attention ! www . xinggangw . info

Recommend


More recommend