Disentanglement of Visual Concepts from Classifying and Synthesizing - PowerPoint PPT Presentation

Disentanglement of Visual Concepts from Classifying and Synthesizing Scenes Bolei Zhou The Chinese University of Hong Kong

Representation Learning The purpose of representation learning: “To identify and disentangle the underlying explanatory factors hidden in the observed milieu of low-level sensory data.” Bengio, et al. Representation Learning: A review and new perspectives.

Sources of Deep Representations Image Classification Self Supervised Learning Image Generation Audio prediction, ECCV’16 Object Recognition Colorization Scene Recognition ECCV’16 and CVPR’17

Outline • Disentanglement of Concepts from Classifying Scenes • Sanity Check Experiment: Mixture of MNIST • Disentanglement of Visual Concepts from Synthesizing Scenes • Future Directions

My Previous Talks • On the importance of single units CVPR’18 Tutorial talk: https://www.youtube.com/watch?v=1aSS5GEH58U • Interpretable representation learning for visual intelligence MIT thesis defense: https://www.youtube.com/watch?v=J7Zz_33ZeJc

Neural Networks for Scene Classification http://places2.csail.mit.edu/demo.html https://github.com/CSAILVision/places365

What are the internal units for classifying scenes? Convolutional Neural Network (CNN) Cafeteria (0.9) Units as concept detectors Unit 22 at Layer 5: Face Unit2 at Layer4: Lamp Unit42 at Layer3 : Trademark Unit 57 at Layer4: Windows

What is a unit doing? - Visualize the unit Back-propagation Image Synthesis Deconvolution [Simonyan et al., ICLR’15] [Springerberg et al., ICLR’15] [Selvaraju, ICCV’17] [Nguyen et al., NIPS’16] [Dosovitskiy et al., CVPR’16] [Zeiler et al., ECCV’14] [Mahendran, et al., CVPR’15] [Girshick et al., CVPR’14]

Data Driven Visualization Unit1: Top activated images Unit2: Top activated images Unit3: Top activated images https://github.com/metalbubble/cnnvisualizer Layer 5

Annotating the Interpretation of Units Amazon Mechanical Turk Word/Description to summarize the images: Which category the description Lamp ______ belongs to: - Scene - Region or surface - Object - Object part - Texture or material - Simple elements or colors [Zhou, Khosla, Lapedriza, Oliva, Torralba. ICLR 2015]

Interpretable Representations for Objects and Scenes 59 units as objects at conv5 of 151 units as objects at conv5 of AlexNet on ImageNet AlexNet on Places dog building dog windows bird baseball field face tie

Quantify the Interpretability of Networks [Zhou*, Bau*, et al. TPAMI’18, CVPR 2017] Network Dissection units water 0 6 conv5 unit 41 (texture) conv5 unit 107 (object) tree grass plant windowpane car Interpretable Units honeycombed airplane sea mountain skyscraper road ceiling building dog person road painting IoU 0.13 IoU 0.16 stove bed chair horse conv5 unit 144 (object) conv5 unit 79 (object) floor house sky track waterfall bus mountain sink cabinet car pool table shelf sidewalk mountain snowy book ball pit 32 objects IoU 0.13 IoU 0.14 skyscraper street building facade pantry conv5 unit 88 (object) conv5 unit 252 (texture) 6 scenes hair wheel shop window head screen crosswalk waffled 6 parts grass food wood 2 materials lined dotted studded banded honeycombed zigzagged IoU 0.13 IoU 0.14 grid paisley potholed meshed conv5 unit 229 (texture) conv5 unit 191 (texture) swirly spiralled freckled sprinkled fibrous waffled pleated paisley grooved grid cracked chequered cobwebbed matted stratified perforated IoU 0.12 IoU 0.13 woven 25 textures red 1 color

Evaluate Unit for Semantic Segmentation Testing Dataset: 60,000 images annotated with 1,200 concepts Unit 1: Top activated images from the Testing Dataset Top Concept: Lamp, Intersection over Union (IoU)= 0.23

Layer5 unit 79 car (object) IoU=0.13 Layer5 unit 107 road (object) IoU=0.15 118/256 units covering 72 unique concepts

AlexNet ResNet GoogLeNet VGG House Airplane

More results in the TPAMI extension paper Comparison of different network architectures Comparison of supervisions (supervised v.s. self-supervised) Interpreting Deep Visual Representations via Network Dissection https://arxiv.org/pdf/1711.05611.pdf

Sanity Check Experiment for Disentanglement • How to quantitatively evaluate the solution reached by CNN? • What are the hidden factors in object recognition and scene recognition? Object Recognition Scene Recognition

Sanity Check Experiment for Disentanglement A controlled classification experiment: Mixture of MNIST 10 digits from MNIST Pairwise combination of digits Class 1 (3,6) Class 2 (0,2) Class 3 (4,5) … Class N With Wentao Zhu (PKU)

Solving Mixture of MNIST To classify the given image into one of 45 classes • Training data � 20,000 images • Accuracy on validation set: 91.7% A simple convnet for classification Class number Layer1: 10 units Layer2: 20 units Layer3: 10 units Global average pooling Softmax: 45 classes

Digit Detectors Emerge from Solving Mixture of MNIST Unit03 for detecting digit 0 Precision: @100=1.00 @300=1.00 @500=1.00 @700=0.99 @(recall=0.25)=0.99 @(recall=0.50)=0.98 @(recall=0.75)=0.90 Top activated images: Activation:

Digit Detectors Emerge from Solving Mixture of MNIST Two metrics for unit importance: alignment score and ablation effect

Dropout Affects the Unit as Digit Detector Baseline Baseline + Dropout on the conv3

Layer Width Affects the Unit as Digit Detector • Wider network performs better at disentanglement • Less reliance on single units Baseline Baseline with tripling the number of units at conv3

Layer Width Affects the Unit as Digit Detector • Wider layer performs better at disentanglement • Less reliance on single units Baseline Baseline with tripling the number of units at conv3

Wider layer + Dropout Baseline Baseline with wider layer Baseline with wider layer + dropout

Usefulness Experiment • Take 8 and 9 as redundant digits (randomly shown in all classes) • Effective digits: 0-7 • Number of classes: 28

Deep Neural Networks for Synthesizing Scenes Generative Adversarial Networks Goodfellow, et al. NIPS’14 Radford, et al. ICLR’15 T Karras et al. 2017 A. Brock, et al. 2018

T Karras et al. 2017

How to Add or Modify Contents? Input: Output: Random noise Synthesized image Add trees Add domes

Understanding the Internal Units in GANs Output: Synthesized image Input: Random noise What are they doing? David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, J. Tenenbaum, W. Freeman, A. Torralba. GAN Dissection: Visualizing and Understanding GANs. ICLR’19. https://arxiv.org/pdf/1811.10597.pdf

Framework of GAN Dissection David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, J. Tenenbaum, W. Freeman, A. Torralba. GAN Dissection: Visualizing and Understanding GANs. https://arxiv.org/pdf/1811.10597.pdf

Units Emerge as Drawing Objects Unit 365 draws trees. Unit 43 draws domes. Unit 14 draws grass. Unit 276 draws towers.

Manipulating the Images Synthesized Images Synthesized Images with Unit 4 removed Unit 4 for drawing Lamp

Interactive Image Manipulation All the code and paper are available at http://gandissect.csail.mit.edu

Latest Work on Using GAN to Manipulate Real Image • Challenge: Invert hidden code for any given image Output: Synthesized image Input: Hidden code z

Future Directions Defend and Attack by Generalization Adversarial Samples & Overfitting Interpretable Deep Learning Network GAN & Deep RL Compression Plasticity & Transfer Learning

Why Care About Interpretability? ‘Alchemy’ of Deep Learning ‘Chemistry’ of Deep Learning Scientific Understanding

Disentanglement of Visual Concepts from Classifying and Synthesizing - PowerPoint PPT Presentation

Disentanglement of Visual Concepts from Classifying and Synthesizing Scenes Bolei Zhou The Chinese University of Hong Kong Representation Learning The purpose of representation learning: To identify and disentangle the underlying

V0D 2016 Classifying Studies V0D V0D 2016 Classifying Studies 1 2016 Classifying Studies

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Classifying Homogeneous Structures Cherlin Introduction The finite case Gregory Cherlin

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement Micha Elsner and

S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation

Learning Discrete and Continuous Factors of Data via Alternating Disentanglement Yeonwoo Jeong,

Content-Collaborative Disentanglement Representation Learning for Enhanced Recommendation

Disentangling Disentanglement in Variational Autoencoders ICML 2019 June 12, 2019 Departments

Solving Disentanglement Puzzles with Hints from Topology Alexa Tsintolas Topological Space

A TWO-STEP DISENTANGLEMENT METHOD SNU Datamining Laboratory 2018. 8. 6 Seminar Sungwon, Lyu

Weakly Supervised Disentanglement with Guarantees Rui Shu Joint work with Yining Chen, Abhishek

Flexibly Fair Representation Learning by Disentanglement Elliot Creager 1 2 David Madras 1 2 J

CONCEPTS AND CONCEPTS AND CONCEPTS AND CONCEPTS AND PR PR PRINC PRINC NCIPLES OF NCIPLES

Wire Models Wires Professor Chris Kim University of Minnesota Dept. of ECE chriskim@umn.edu

Verilog HDL:Digital Design and Modeling Chapter 9 Structural Modeling Chapter 9 Structural

MPC across the wire: There is something you require Dragos Rotaru KU Leuven, University of

E-Wissenschaft - Enhancing how? Tobias Blanke tobias.blanke@kcl.ac.uk Centre for e-Research,

Community Self-Help 15 May 2019 Welcome and introduction Councillor Mary Evans Suffolk County

Landlord101 Presented By LandlordBC Understanding the Landscape Module 1: Tenant Selection

Critical Making Fall 2018 RJ Duran Draw Darth Take 5 mins to draw Darth Vader Ready

URBAN PLANNING AND THE ROLE OF LISTENING MARCEL COBUSSEN (LEIDEN UNIVERSITY, THE NETHERLANDS)