self critical reasoning for robust visual question
play

Self-Critical Reasoning for Robust Visual Question Answering Jialin - PowerPoint PPT Presentation

Self-Critical Reasoning for Robust Visual Question Answering Jialin Wu and Raymond J. Mooney Visual Question Answering (VQA) Common VQA system What utensil is pictured? Knife (0.72) Answer Prediction Fork Visual feature set


  1. Self-Critical Reasoning for Robust Visual Question Answering Jialin Wu and Raymond J. Mooney

  2. Visual Question Answering (VQA) β€’ Common VQA system What utensil is pictured? Knife (0.72) Answer Prediction Fork Visual feature set 𝒲 Original image (0.66)

  3. Capture superficial statistical correlations between QA pairs I won’t bother to look at the image, I What utensil is pictured? can answer your question by just looking at the question Training Answer Distribution 100 VQA 80 Knife system 60 40 20 0 Original image knife fork

  4. Force VQA to focus on what humans focus on β€’ Extract a proposal set of objects ( ) that humans focus on. There is a fork near the cake. Human textual explanation OR Proposal object set Human visual explanation

  5. Force VQA to focus on what humans focus on β€’ Enforce the gradients for the correct answer to have the largest value for at least one of the extracted objects. Influence βˆ‡ # π‘ž(𝑔𝑝𝑠𝑙|𝑅, 𝒲) Strengthen Loss Proposal object set

  6. Results β€’ Compared to baseline model on VQA-CP dataset β€’ VQA-CP dataset manually set the train and test set in very different distribution VQA scores 53 48 43 38 All Baseline Ours (infl)

  7. Over sensitivity to the most common objects I can focus on the fork but I still What utensil is pictured? think it is a knife VQA Knife system Focused objects Focused objects for answer β€œfork” for answer β€œknife”

  8. Criticizing the false influential object β€’ Find the most influential object for the correct answer using gradients What utensil is pictured? Knife (0.72) Answer Prediction Fork Visual feature set 𝒲 Original image (0.66) There is a fork βˆ‡ # π‘ž(𝑔𝑝𝑠𝑙|𝑅, 𝒲) near the cake. Explaining prediction Human textual explanation β€œfork” OR The most influential object Human visual explanation Proposal object set

  9. Criticizing the false influential object β€’ Force the object to contribute more to the correct answer. What utensil is pictured? Explaining prediction β€œknife” Knife (0.72) βˆ‡ # π‘ž(π‘™π‘œπ‘—π‘”π‘“|𝑅, 𝒲) Answer Prediction Fork Visual feature set 𝒲 Original image (0.66) There is a fork βˆ‡ # π‘ž(𝑔𝑝𝑠𝑙|𝑅, 𝒲) near the cake. Self Critical Loss Explaining prediction Human textual explanation β€œfork” OR The most influential object Human visual explanation Proposal object set

  10. Our self-critical approach What utensil is pictured? Oh, yes, the utensil should be a fork. VQA Fork system

  11. Results β€’ Compared to baseline model on VQA-CP dataset VQA scores 52 50 48 46 44 42 40 38 All Baseline Ours (infl) Ours (infl + crit)

Recommend


More recommend