disentanglement of visual concepts from classifying and
play

Disentanglement of Visual Concepts from Classifying and Synthesizing - PowerPoint PPT Presentation

Disentanglement of Visual Concepts from Classifying and Synthesizing Scenes Bolei Zhou The Chinese University of Hong Kong Representation Learning The purpose of representation learning: To identify and disentangle the underlying


  1. Disentanglement of Visual Concepts from Classifying and Synthesizing Scenes Bolei Zhou The Chinese University of Hong Kong

  2. Representation Learning The purpose of representation learning: “To identify and disentangle the underlying explanatory factors hidden in the observed milieu of low-level sensory data.” Bengio, et al. Representation Learning: A review and new perspectives.

  3. Sources of Deep Representations Image Classification Self Supervised Learning Image Generation Audio prediction, ECCV’16 Object Recognition Colorization Scene Recognition ECCV’16 and CVPR’17

  4. Outline • Disentanglement of Concepts from Classifying Scenes • Sanity Check Experiment: Mixture of MNIST • Disentanglement of Visual Concepts from Synthesizing Scenes • Future Directions

  5. My Previous Talks • On the importance of single units CVPR’18 Tutorial talk: https://www.youtube.com/watch?v=1aSS5GEH58U • Interpretable representation learning for visual intelligence MIT thesis defense: https://www.youtube.com/watch?v=J7Zz_33ZeJc

  6. Neural Networks for Scene Classification http://places2.csail.mit.edu/demo.html https://github.com/CSAILVision/places365

  7. What are the internal units for classifying scenes? Convolutional Neural Network (CNN) Cafeteria (0.9) Units as concept detectors Unit 22 at Layer 5: Face Unit2 at Layer4: Lamp Unit42 at Layer3 : Trademark Unit 57 at Layer4: Windows

  8. What is a unit doing? - Visualize the unit Back-propagation Image Synthesis Deconvolution [Simonyan et al., ICLR’15] [Springerberg et al., ICLR’15] [Selvaraju, ICCV’17] [Nguyen et al., NIPS’16] [Dosovitskiy et al., CVPR’16] [Zeiler et al., ECCV’14] [Mahendran, et al., CVPR’15] [Girshick et al., CVPR’14]

  9. Data Driven Visualization Unit1: Top activated images Unit2: Top activated images Unit3: Top activated images https://github.com/metalbubble/cnnvisualizer Layer 5

  10. Annotating the Interpretation of Units Amazon Mechanical Turk Word/Description to summarize the images: Which category the description Lamp ______ belongs to: - Scene - Region or surface - Object - Object part - Texture or material - Simple elements or colors [Zhou, Khosla, Lapedriza, Oliva, Torralba. ICLR 2015]

  11. Interpretable Representations for Objects and Scenes 59 units as objects at conv5 of 151 units as objects at conv5 of AlexNet on ImageNet AlexNet on Places dog building dog windows bird baseball field face tie

  12. Quantify the Interpretability of Networks [Zhou*, Bau*, et al. TPAMI’18, CVPR 2017] Network Dissection units water 0 6 conv5 unit 41 (texture) conv5 unit 107 (object) tree grass plant windowpane car Interpretable Units honeycombed airplane sea mountain skyscraper road ceiling building dog person road painting IoU 0.13 IoU 0.16 stove bed chair horse conv5 unit 144 (object) conv5 unit 79 (object) floor house sky track waterfall bus mountain sink cabinet car pool table shelf sidewalk mountain snowy book ball pit 32 objects IoU 0.13 IoU 0.14 skyscraper street building facade pantry conv5 unit 88 (object) conv5 unit 252 (texture) 6 scenes hair wheel shop window head screen crosswalk waffled 6 parts grass food wood 2 materials lined dotted studded banded honeycombed zigzagged IoU 0.13 IoU 0.14 grid paisley potholed meshed conv5 unit 229 (texture) conv5 unit 191 (texture) swirly spiralled freckled sprinkled fibrous waffled pleated paisley grooved grid cracked chequered cobwebbed matted stratified perforated IoU 0.12 IoU 0.13 woven 25 textures red 1 color

  13. Evaluate Unit for Semantic Segmentation Testing Dataset: 60,000 images annotated with 1,200 concepts Unit 1: Top activated images from the Testing Dataset Top Concept: Lamp, Intersection over Union (IoU)= 0.23

  14. Layer5 unit 79 car (object) IoU=0.13 Layer5 unit 107 road (object) IoU=0.15 118/256 units covering 72 unique concepts

  15. AlexNet ResNet GoogLeNet VGG House Airplane

  16. More results in the TPAMI extension paper Comparison of different network architectures Comparison of supervisions (supervised v.s. self-supervised) Interpreting Deep Visual Representations via Network Dissection https://arxiv.org/pdf/1711.05611.pdf

  17. Sanity Check Experiment for Disentanglement • How to quantitatively evaluate the solution reached by CNN? • What are the hidden factors in object recognition and scene recognition? Object Recognition Scene Recognition

  18. Sanity Check Experiment for Disentanglement A controlled classification experiment: Mixture of MNIST 10 digits from MNIST Pairwise combination of digits Class 1 (3,6) Class 2 (0,2) Class 3 (4,5) … Class N With Wentao Zhu (PKU)

  19. Solving Mixture of MNIST To classify the given image into one of 45 classes • Training data � 20,000 images • Accuracy on validation set: 91.7% A simple convnet for classification Class number Layer1: 10 units Layer2: 20 units Layer3: 10 units Global average pooling Softmax: 45 classes

  20. Digit Detectors Emerge from Solving Mixture of MNIST Unit03 for detecting digit 0 Precision: @100=1.00 @300=1.00 @500=1.00 @700=0.99 @(recall=0.25)=0.99 @(recall=0.50)=0.98 @(recall=0.75)=0.90 Top activated images: Activation:

  21. Digit Detectors Emerge from Solving Mixture of MNIST Two metrics for unit importance: alignment score and ablation effect

  22. Dropout Affects the Unit as Digit Detector Baseline Baseline + Dropout on the conv3

  23. Dropout Affects the Unit as Digit Detector Baseline Baseline + Dropout on the conv3

  24. Layer Width Affects the Unit as Digit Detector • Wider network performs better at disentanglement • Less reliance on single units Baseline Baseline with tripling the number of units at conv3

  25. Layer Width Affects the Unit as Digit Detector • Wider layer performs better at disentanglement • Less reliance on single units Baseline Baseline with tripling the number of units at conv3

  26. Wider layer + Dropout Baseline Baseline with wider layer Baseline with wider layer + dropout

  27. Usefulness Experiment • Take 8 and 9 as redundant digits (randomly shown in all classes) • Effective digits: 0-7 • Number of classes: 28

  28. Deep Neural Networks for Synthesizing Scenes Generative Adversarial Networks Goodfellow, et al. NIPS’14 Radford, et al. ICLR’15 T Karras et al. 2017 A. Brock, et al. 2018

  29. T Karras et al. 2017

  30. How to Add or Modify Contents? Input: Output: Random noise Synthesized image Add trees Add domes

  31. Understanding the Internal Units in GANs Output: Synthesized image Input: Random noise What are they doing? David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, J. Tenenbaum, W. Freeman, A. Torralba. GAN Dissection: Visualizing and Understanding GANs. ICLR’19. https://arxiv.org/pdf/1811.10597.pdf

  32. Framework of GAN Dissection David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, J. Tenenbaum, W. Freeman, A. Torralba. GAN Dissection: Visualizing and Understanding GANs. https://arxiv.org/pdf/1811.10597.pdf

  33. Units Emerge as Drawing Objects Unit 365 draws trees. Unit 43 draws domes. Unit 14 draws grass. Unit 276 draws towers.

  34. Manipulating the Images Synthesized Images Synthesized Images with Unit 4 removed Unit 4 for drawing Lamp

  35. Interactive Image Manipulation All the code and paper are available at http://gandissect.csail.mit.edu

  36. Latest Work on Using GAN to Manipulate Real Image • Challenge: Invert hidden code for any given image Output: Synthesized image Input: Hidden code z

  37. Future Directions Defend and Attack by Generalization Adversarial Samples & Overfitting Interpretable Deep Learning Network GAN & Deep RL Compression Plasticity & Transfer Learning

  38. Why Care About Interpretability? ‘Alchemy’ of Deep Learning ‘Chemistry’ of Deep Learning Scientific Understanding

Recommend


More recommend