Generating Visual Explanations Lisa Anne Hendricks [et al](Mar, - PowerPoint PPT Presentation

Generating Visual Explanations Lisa Anne Hendricks [et al](Mar, 2016) UC Berkeley Anurag Patil

Outline 1. Motivation 2. The Problem and Importance 3. The Approach a. The Relevance Loss b. The Discriminative Loss 4. Dataset 5. Experiments and Results 6. Critique

Motivation Explainable AI : Why should we care about it? Explainability is about trust. It’s important to know why our self-driving car decided to slam on the brakes. Explanation are required for regulatory compliance in certain industries. eg: medical diagnosis, equal credit opportunity act in US Explanation can facilitate model validation and debugging . Models learn associative (not necessarily causal patterns in training data). Explanations can reveal spurious associations. But, tradeoff of performance vs explainability

Motivation : Explainable Models Two broad ideas of : 1. Introspection explanation systems : which explain how a model determines its final output (eg : This is Western grebe because filter 2 has a high activation) 2. Justification explanation systems : which produce sentences detailing how visual evidence is compatible with a system output ( eg : This is Western Gerbe because it has red eyes..) Here, we look as justification explanation systems because they are more suited for non-experts . We apply the principles to the classification by visual systems Here, Applying the idea of explainability to classification by visual systems.

The Problem and Importance Description : sentence based only on visual information (image captioning systems) Visual Explanation : sentence that details why a certain category is appropriate for a given image while only mentioning the image relevant features.

The Approach Condition language generation on image and predicted class label. Other captioning models: condition only on visual features. For this use fine grained recognition pipeline + novel loss function to include class discriminative information. Challenge : Class specificity is a global sentence property i.e the words black or red eye are less class discriminative on their own but the entire sentence: This is an all black bird with a bright red eye is class specific to Bronzed Cowbird. Typical loss functions optimize on sentence alignment b/w generated and the ground truth.

Note on LRCN

Model Inputs : [image, category label, ground truth sentence]

Proposed Loss Proposed loss Relevance Loss Discriminative Loss - Relevance loss (LR ) is related with "Image Relevance" - Discriminative loss ( E[R D ( w̃ )] ) is related with "Class Relevance"

Relevance Loss : N = the batch size | Wt = ground truth word |I = image | C = category - Produces sentences that correspond to the image content - Does not explicitly encourage generated sentences which are both image relevant and category specific . - Class Labels : Average hidden state of another separate LSTM to generate word sequences conditioned on images. [Average across all sequences for all classes in the train set]

Discriminative Loss : p( w | I,C ) = model’s estimate conditional distribution R D ( w̃ ) = reward for the sampled description E[R D ( w̃ )] = Estimation of the reward Agent = LSTM w̃ = sampled description from LSTM (p( w | I,C )) Env = previous generated words - Based on a reinforcement learning paradigm . - R D ( w̃ ) = p D (C| w̃ ) Action = predict next - p D (C| w̃ ) : pretrained sentence classifier word based on policy - The accuracy of this classifier(pretrained) is and the env. not important (22%) : sampled sentences from Policy = defined by LSTM(pL(w)) weights W

Minimizing the loss - Since expectation over descriptions( E[R D ( w̃ )]) is intractable, use Monte Carlo sampling from LSTM [p( w | I,C )]. - p( w | I,C ) is a discrete distribution - To avoid differentiating R D ( w̃ ) w.r.t W use REINFORCE property Log p( w̃ )= log likelihood of the sampled - The final gradient to update weights W description L R = log likelihood of the ground truth description

Dataset Caltech UCSD Birds: 200 classes of North American Bird species |11,788 images | 5 captions/image - Every image belongs to a class and therefore sentence and image are associated with single label. - Descriptive details about each bird class. - Does not explain why an image belongs to a certain class.

Experiments Baseline and ablation model : Description model : generates sentences conditioned only on images - (equivalent to LRCN) Definition model : sentences using only image label as input. - Explanation-label : not trained with discriminative loss - Explanation-discriminative : not conditioned on predicted class. - Metrics : - Image relevance : METEOR, CIDEr - Class relevance : class similarity score, class rank.

Results Small gain in automatic evaluation metrics for Image Relevance. But huge gains in Class relevance Metrics.

Results Comparison of Explanations, Baselines, and Ablations. - Green : correct, Yellow : mostly correct, Red : incorrect - ’Red eye’ is a class relevant attribute

Results Comparison of Explanations and Definitions. - Definition can produce sentences which are not image relevant

Results Comparison of Explanations and Descriptions. - Both models generate visually correct sentences. - ’Black head’ is one of the most prominent distinguishing properties of this vireo type.

Critique – The Good ● Motivation : ○ Novel motivation of making the models more explainable to non experts ● Explanation Model : ○ Novel loss function to include global sentence property. ○ Loss function also has a wide generic applicability. ● Ablation study: ○ Performed ablation study of all the important model components, gives reasoning behind model design decisions.

Critique – The not so good ● Motivation ○ What if underlying feature in the network was not identifying red eye, it was instead identifying that there is a bird flying over water, there is no way that you would know. ● Dataset : ○ Every image belongs to a class and therefore sentence and image are associated with single label. ● Explanation Model: ○ Reduce the variance of the gradient estimation in REINFORCE by inclusion of baseline? ○ Choice of other reward functions based on class similarity and class rank? ○ Use of attention layers to combine text and image features? ● Missing details: ○ Why didn't the accuracy of the LSTM classifier matter? ● Evaluation Methodology: ○ Comparison with other SOTA image captioning models? ● Human evaluation improvements ○ Include a reason why they chose which sentence was ranked higher.

References - Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, Trevor Darrell, Generating Visual Explanations, European Conference on Computer Vision (ECCV), 2016

Additional Examples

Generating Visual Explanations Lisa Anne Hendricks [et al](Mar, - PowerPoint PPT Presentation

Generating Visual Explanations Lisa Anne Hendricks [et al](Mar, 2016) UC Berkeley Anurag Patil Outline 1. Motivation 2. The Problem and Importance 3. The Approach a. The Relevance Loss b. The Discriminative Loss 4. Dataset 5.

Generating Visual Explanations Lisa et al. Seoul National University

Teaching Categories to Human Learners with Visual Explanations Oisin Mac Aodha Can we design

Explanations in Constraint Programming Barry OSullivan Cork Constraint Computation Centre

How Do Visual Explanations Foster End Users Appropriate Trust In Machine Learning? Fumeng Yang

Grad-CAM Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Maulik Shah,

CourtTime: Generating Actionable Insights into Tennis Matches Using Visual Analytics Tom Polk,

Meta-Explanations, Interpretable Clustering & Other Recent Developments Fraunhofer HHI,

Recursive Definitions Generating Functions Lecture 18 Generating Functions A generating

A User Study on the Effect of Aggregating Explanations for Interpreting Machine Learning Models

Z Explanations of Neutral Current B Anomalies by Ben Allanach (University of Cambridge)

Finding Explanations Instead of finding structure in a data set, we are now focusing on methods

Table of Contents I Diagnostic Agents Recording the History of a Domain Defining Explanations

Atikokan Generating Station Thunder Bay Generating Station March 5, 2013 Alberta Biomaterials

PROCEDURE FOR GENERATING PROCEDURE FOR GENERATING XGN GENERATED MANIFEST XGN GENERATED MANIFEST

The One-Quarter Fraction Need two generating relations. E.g. a 2 6 2 design, with generating

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

Thoughts on Martin-Lfs Meaning Explanations Peter Dybjer Chalmers tekniska hgskola,

Proof explanations: using natural language and graph view Fr ed erique GUILHOT Hanane

Explanations for Creativity S H A S H A N K SA H U D E PT. O F P H Y S I C S , I . I .T. K A

The Zone of Productive Change: Generating Change without Generating Resistance Roy Marriott

Generating Subfields Mark van Hoeij June 15, 2017 Mark van Hoeij Generating Subfields Overview

Generating Functions Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

Generating Visual Explanations Lisa Anne Hendricks [et al](Mar, - PowerPoint PPT Presentation

Generating Visual Explanations Lisa Anne Hendricks [et al](Mar, 2016) UC Berkeley Anurag Patil Outline 1. Motivation 2. The Problem and Importance 3. The Approach a. The Relevance Loss b. The Discriminative Loss 4. Dataset 5.

Generating Visual Explanations Lisa et al. Seoul National University

Teaching Categories to Human Learners with Visual Explanations Oisin Mac Aodha Can we design

Explanations in Constraint Programming Barry OSullivan Cork Constraint Computation Centre

How Do Visual Explanations Foster End Users Appropriate Trust In Machine Learning? Fumeng Yang

Grad-CAM Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Maulik Shah,

CourtTime: Generating Actionable Insights into Tennis Matches Using Visual Analytics Tom Polk,

Meta-Explanations, Interpretable Clustering &amp; Other Recent Developments Fraunhofer HHI,

Recursive Definitions Generating Functions Lecture 18 Generating Functions A generating

A User Study on the Effect of Aggregating Explanations for Interpreting Machine Learning Models

Z Explanations of Neutral Current B Anomalies by Ben Allanach (University of Cambridge)

Finding Explanations Instead of finding structure in a data set, we are now focusing on methods

Table of Contents I Diagnostic Agents Recording the History of a Domain Defining Explanations

Atikokan Generating Station Thunder Bay Generating Station March 5, 2013 Alberta Biomaterials

PROCEDURE FOR GENERATING PROCEDURE FOR GENERATING XGN GENERATED MANIFEST XGN GENERATED MANIFEST

The One-Quarter Fraction Need two generating relations. E.g. a 2 6 2 design, with generating

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

Thoughts on Martin-Lfs Meaning Explanations Peter Dybjer Chalmers tekniska hgskola,

Proof explanations: using natural language and graph view Fr ed erique GUILHOT Hanane

Explanations for Creativity S H A S H A N K SA H U D E PT. O F P H Y S I C S , I . I .T. K A

The Zone of Productive Change: Generating Change without Generating Resistance Roy Marriott

Generating Subfields Mark van Hoeij June 15, 2017 Mark van Hoeij Generating Subfields Overview

Generating Functions Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

Meta-Explanations, Interpretable Clustering & Other Recent Developments Fraunhofer HHI,