Generating Visual Explanations Lisa et al. Seoul National - PowerPoint PPT Presentation

Generating Visual Explanations Lisa et al. 이 종 진 Seoul National University ga0408@snu.ac.kr Nov 15, 2018 1/20

Explainable AI; Generating Visual Explanations ◮ Deep classification methods have had tremendous success in visual reconition. ◮ Most of them cannot provide a consistent justification of why it made a certain prediction. 2/20

Explainable AI; Generating Visual Explanations ◮ Proposed model predicts a class label(CNN), and explains why the predicted label is appropriate for the image(RNN) ◮ First method to produce deep visual explanations using language justifications ◮ Provide an explanation not a description 3/20

Visual Explanation Description: This is a large bird with a white neck and a black back in the water Class Definition: The Western Grebe is a waterbird with a yellow pointly beak, white neck and belly, and black back. Explanation: This is a Western Grebe because this bird has a long white neck, pointly yellow beak and red eye. ◮ Explanation should be class discriminative!! 4/20

Visual Explanation ◮ Visual explanation are both image relevant and class relevant. ◮ Discriminate class and accurately describe a specific image instance. → Novel Loss function. 5/20

Proposed Model ◮ Input : Image (+ Descriptive Sentences) ◮ Output : This is a CLASS , because argument 1 and argument 2 and... ◮ Use pretrained CNN(Compact bilinear fine- grained classificaiton model), Sentence classifier(Single Layer LSTM) ◮ Two contributions are using a predicted label as a input and using novel loss(discrimiative loss) for image relevance and class relevance 1. Use a predicted label as a input 2. Propose a novel reinforcement learing based loss for image relevance and class relevance 6/20

Architecture Figure: Architecture 7/20

Bilinear Models ◮ f : L × I �→ R c × D , a location L and image I ◮ f A , f B : use pretrained VGG ◮ Use pooling operation P ( f A ( l , I ) T f B ( l , I ) , l ∈ L ) ◮ (e.g) φ ( I ) = � f A ( l , I ) T f B ( l , I ) l ∈ L 8/20

Proposed loss ◮ Proposed loss L R − λ E ˜ w ∼ p L ( w ) [ R D ( ˜ w )] ◮ Relevance loss( L R ) is related with "Image Relevance" ◮ Discriminiative loss( E ˜ w ∼ p L ( w ) [ R D ( ˜ w )] ) is related with "Class Relevance" 9/20

Relevance Loss ◮ Relevance Loss( L R ) N − 1 T − 1 L R = 1 � � log p L ( w t + 1 | w o : t , I , C ) N n = 0 t = 0 – w t : ground truth word at t, I : image, C : category, N : batch size – Average hidden state of the LSTM 10/20

◮ Discriminative Loss w ∼ p L ( w ) [ R D ( ˜ w )] E ˜ – Based on a reinforcement learning paradigm. – R D ( ˜ w ) = p D ( C | ˜ w ) – p D ( C | w ) : pretrained sentence classifier – The accuracy of this classifier(pretrained) is not important (22%) – ˜ w : sampled sentences from LSTM ( p L ( w )) 11/20

Novel Loss ◮ Relevance Loss N − 1 T − 1 L R = 1 � � log p L ( w t + 1 | w o : t , I , C ) N n = 0 t = 0 ◮ Discriminative Loss R D ( ˜ w ) = p D ( C | ˜ w ) – The accuracy of this classifier(pretraine) is not important (22%) ◮ Proposed Loss L R − λ E ˜ w ∼ p L ( w ) [ R D ( ˜ w )] 12/20

Minimizing Loss ◮ Since expectation over descriptions is intractable, use Monte Carlo sampling from LSTM. ◮ ∇ E ˜ w ∼ p L ( w ) [ R D ( ˜ w )] = E ˜ w ∼ p L ( w ) [ R D ( ˜ w ) ∇ W L log P ( ˜ w )] ◮ The final gradient to update the weights W ∇ W L L R − λ R D ( ˜ w ) ∇ W L log P ( ˜ w ) 13/20

Experiment ◮ Dataset : Caltech UCSD Birds 200-2011(CUB) – Contains 200 classes of North American bird species. – 11,788 images – 5 sentences for detail description of the bird(These are not collected for the task of visual explanation.) ◮ 8,192 dimensional features from the classifier – Features from the penultimate layer of the compact bilinear fine-grained classification model – Pre-trained on the CUB dataset – accuracy : 84% ◮ LSTM – 1000-dimensional embedding, 1000 dimensional LSTM 14/20

Experiment ◮ Baseline models : Description model & Definition model – Description model : Training the model by conditioning only on the image features as input – Definition model : Training the model to generate explaining sentences only using the image label as input ◮ Abalation models : Explation-label model & Explanation-discriminative model 15/20

Measure ◮ METEOR(Image relevance) – METEOR is computed by matching words(synonyms) in generated and reference sentences ◮ CIDEr(Image relevance) – CIDEr measures the similarity of a generated sentence to reference sentence by counting common n-grams which are TF-IDF weighted. ◮ Similarity(class relevance) – Compute CIDEr scores using all reference sentences which correspond to a particular class, instead of using ground truth ◮ Rank(class relevance) – Ranking over similarity of all classes 16/20

Experiment : Results Figure: Result 17/20

Experiment : Results ◮ Comparison of Explanations, Baselines, and Ablations. – Green : correct, Yellow : mostly correct, Red : incorrect – ’Red eye’ is a class relevant attributes. 18/20

Experiment : Results ◮ Comparison of Explanations and Definitions – Definition can produce sentencesd which are not image relevant 19/20

Experiment : Results ◮ Role of Discriminative Loss – Both models generate visually correct sentences. – ’Black head’ is one of the most prominent distinguishing properties of this vireo type. 20/20

Generating Visual Explanations Lisa et al. Seoul National - PowerPoint PPT Presentation

Generating Visual Explanations Lisa et al. Seoul National University ga0408@snu.ac.kr Nov 15, 2018 1/20 Explainable AI; Generating Visual Explanations Deep classification methods have had tremendous success in visual

Generating Visual Explanations Lisa Anne Hendricks [et al](Mar, 2016) UC Berkeley Anurag Patil

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

Teaching Categories to Human Learners with Visual Explanations Oisin Mac Aodha Can we design

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Advanced Electric Generating Advanced Electric Generating Advanced Electric Generating

Ratchaburi Electricity Generating Holding PCL. Ratchaburi Electricity Generating Holding PCL.

Recursive Definitions Generating Functions Lecture 18 Generating Functions A generating

Lisa Thornton, Lisa Thornton Inc Ideal Regulatory Framework for Broadband 25 March 2009 Lisa

Synthetic LISA simulating time-delay interferometry in a model LISA (presenting) Michele

Explanations in Constraint Programming Barry OSullivan Cork Constraint Computation Centre

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Generating Subfields Mark van Hoeij June 15, 2017 Mark van Hoeij Generating Subfields Overview

Atikokan Generating Station Thunder Bay Generating Station March 5, 2013 Alberta Biomaterials

North York Moors National Park Education Service Notes to accompany the Welcome to the Park

Modeling of Local Oxidation Processes Introduction Isolation Processes in the VLSI

Boi-yancy - Walking on Water EDSGN 100 SEC 203 26 July 2016 Team 4 - Jurgensen, Raftery,

Age and ontogenetic structure of populations in Age and ontogenetic structure of populations in

2017 Preservation Awards Presented by St. Petersburg Preservation Thank You Sponsors! His

Yve vett tte S Sistar are Dir irect ctor o of Fin Finance ce Je Jeanne B Bla lack ck

Skill Saw BEST PRACTICES 2018 Skill Saw Uses Typical cutting applications Aluminum rails

LM WIND POWER The India story... AGENDA 1. Milestones in the Indian journey 2. Our CSR