knowledge guided attention and inference for describing
play

Knowledge Guided Attention and Inference for Describing Images - PowerPoint PPT Presentation

Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger rettinger@kit.edu, http://www.aifb.kit.edu/web/Achim_Rettinger/en,


  1. Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger rettinger@kit.edu, http://www.aifb.kit.edu/web/Achim_Rettinger/en, http://www.aifb.kit.edu/web/Inproceedings3603 ADAPTIVE DATA ANALYTICS GROUP INSTITUTE OF APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS (AIFB) KIT – The Research University in the Helmholtz Association www.kit.edu

  2. Multi- Images Lingual Text Knowledge Graphs PD Dr. Achim Rettinger Adaptive Data Analytics Group 2 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  3. Steffen Thoma, Achim Rettinger, Fabian Can we aggregate Both Towards Holistic Concept complementing Representations: Embedding Relational Knowledge, Visual Attributes, and information across Distributional Word Semantics modalities? The Semantic Web – ISWC 2017, Springer, October, 2017 Yes. Cross-modal embeddings do better on several benchmarks. PD Dr. Achim Rettinger Adaptive Data Analytics Group 3 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  4. Fabian Both, Steffen Thoma, Achim Rettinger. Cross-modal Knowledge Transfer: Improving the Word Embedding of Apple by Looking at Oranges. Can we extrapolate cross-modal K-CAP2017, The 9th International Conference on Knowledge Capture, information to entities unseen in ACM, Dezember, 2017 some of the other modalities? Yes. Specifically hyponyms profit more 3M 1.5K than hypernyms. PD Dr. Achim Rettinger Adaptive Data Analytics Group 4 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  5. Aditya Mogadala, Umanga Bista, Lexing Xie and Achim Rettinger. Knowledge Guided Attention and Inference for Describing Images Which Can we extrapolate knowledge Contain Unseen Objects, ESWC 2018 about translating entities across modalities without having seen them during training? ? PD Dr. Achim Rettinger Adaptive Data Analytics Group 5 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  6. IMAGE CAPTION GENERATION

  7. Visual Object Detection Images on the Web depict a huge variety of visual objects Truffle Mammoth Blackbird Papaya 642 Visual Object Categories by ImageNet PD Dr. Achim Rettinger Adaptive Data Analytics Group 7 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  8. Description Generation for Images Training data for image captioning (i.e. image- caption pairs) cover only a fraction of objects that can be detected by image classifiers. 80 MSCOCO Visual Object Categories PD Dr. Achim Rettinger Adaptive Data Analytics Group 8 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  9. Challenge - Missing Captions for Images Parallel caption training examples are missing for images containing visual object category “ pizza ”. Caption Generation A man is making a sandwich in a with Standard restaurant. Model A man is holding a pizza in his Expected from hands. Model PD Dr. Achim Rettinger Adaptive Data Analytics Group 9 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  10. Related Work Approaches that can handle unseen objects. Caption PD Dr. Achim Rettinger Adaptive Data Analytics Group 10 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  11. Missing in Related Work Our attention mechanism learns to Attention focus on the salient aspects in the image for caption generation. Inference Transfer either before or during inference. We do both. PD Dr. Achim Rettinger Adaptive Data Analytics Group 11 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  12. KNOWLEDGE GUIDED ATTENTION AND INFERENCE

  13. Our Contributions ESA Introduce an attention mechanism into the caption generation model from External Semantic Knowledge (ESA) provided by a knowledge graph (KG) CI Constraint before and during Inference (CI) for transferring information between seen words and unseen visual object categories by exploiting external semantic knowledge provided by a knowledge graph (KG). PD Dr. Achim Rettinger Adaptive Data Analytics Group 13 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  14. Knowledge-Guided Assistance Caption Generation (KGA-CGM) Multi Word-Label Classifier p EOS ~ y EOS p 0 ~ y 0 p t ~ y t p t+1 ~ y t+1 ... Softmax Softmax Softmax Softmax TSV Layer TSV Layer TSV Layer TSV Layer Visual Features c L-1 c BOS c t-1 c t Pizza Restaurant Multi Entity-Label Classifier Chef Entity Vectors Hat Camera Restaurant I I I ... LSTM LSTM LSTM LSTM L2-F ... Node1 I Node2 F Node6 L1-F ... ... LSTM LSTM I LSTM LSTM Node3 Node4 P {pizza,restaurant,hat,chef,camera} W P F Chef Pizza ... W w L-1 w BOS w t-1 w t Node5 Partial Scene Graph Grounding Language Model (Image->KB) PD Dr. Achim Rettinger Adaptive Data Analytics Group 14 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  15. External Sematic Attention Multi Word-Label Classifier p 0 ~ y 0 Softmax TSV Layer Visual Features c BOS Pizza Restaurant Multi Entity-Label Classifier Chef Entity Vectors Hat Camera Restaurant I I I LSTM L2-F ... Node1 I Node2 F Node6 L1-F ... LSTM I Node3 Node4 P {pizza,restaurant,hat,chef,camera} W P F Chef Pizza W w BOS Node5 Partial Scene Graph Grounding (Image->KB) PD Dr. Achim Rettinger Adaptive Data Analytics Group 15 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  16. TSV Layer Multi Word-Label Classifier p 0 ~ y 0 Softmax TSV Layer Visual Features c BOS Pizza Restaurant Multi Entity-Label Classifier Chef Entity Vectors Hat Camera Restaurant I I I LSTM L2-F ... Node1 I Node2 F Node6 L1-F ... LSTM I Node3 Node4 P {pizza,restaurant,hat,chef,camera} W P F Chef Pizza W w BOS Node5 Partial Scene Graph Grounding (Image->KB) PD Dr. Achim Rettinger Adaptive Data Analytics Group 16 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  17. [UnseenObj17] Inference – Generating unseen objects Input: M= { W he , W h 2 t , W c t , W I t } Output: M new 1 Initialize List(closest) = cosine distance(List(unseen),vocabulary) ; 2 Initialize W c t [ v unseen ,:], W h 2 t [ v unseen ,:], W I t [ v unseen ,:] = 0 ; 3 Function Before Inference forall items T in closest and Z in unseen do 4 if T and Z is vocabulary then 5 W c t [ v Z ,:] = W c t [ v T ,:] ; 6 t [ v Z ,:] = W h 2 t [ v T ,:] ; W h 2 7 W I t [ v Z ,:] = W I t [ v T ,:] ; 8 end 9 if i T and i Z in visual features then 10 W I t [ i Z , i T ]=0 ; 11 W I t [ i T , i Z ]=0 ; 12 end 13 end 14 M new = M ; 15 return M new ; 16 17 end PD Dr. Achim Rettinger Adaptive Data Analytics Group 17 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  18. EVALUATION

  19. Evaluation Setup 8 held out objects from MSCOCO • Image-Caption Pairs: 70K Training, 20K Validation, 20K Testing • CNN Architectures: VGG16 [Simoyan et. Al. 2014] • Unpaired Textual Corpus: British National Corpus, Wikipedia, SBU1M • Entity Vectors: RDF2Vec [Ristoski et. Al. 2014] • Evaluation Metrics: Meteor, Spice, F1 • Microwave, Racket, Bottle, Zebra, Pizza, Couch , Bus, Suitcase PD Dr. Achim Rettinger Adaptive Data Analytics Group 19 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  20. Qualitative Results Unseen Object: Zebra Predicted Entity-Labels (Top-3): Zebra,Enclosure,Zoo Base: A couple of animals that are standing in a field NOC: Zebras standing together in a field with zebras KGA-CGM: A group of zebras standing in a line Unseen Object: Pizza Predicted Entity-Labels (Top-3): Pizza,Restaurant,Hat Base: A man is making a sandwich in a restaurant NOC: A man standing next to a table with a pizza in front of it. KGA-CGM: A man is holding a pizza in his hands PD Dr. Achim Rettinger Adaptive Data Analytics Group 20 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  21. Quantitative Results F1-Score KGA-CGM (our proposed model). Underline represent second best PD Dr. Achim Rettinger Adaptive Data Analytics Group 21 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  22. Quantitative Results METEOR KGA-CGM (our proposed model) and underline represent second best PD Dr. Achim Rettinger Adaptive Data Analytics Group 22 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

  23. Scaling it by an order of magnitude Unseen Object: Truffle Guidance Before Inference: food → truffle Base: A person holding a piece of paper . KGA-CGM: A close up of a person holding truffle Unseen Object: Papaya Guidance Before Inference: banana → papaya Base: A woman standing in a garden . KGA-CGM: These are ripe papaya hanging on a tree PD Dr. Achim Rettinger Adaptive Data Analytics Group 23 Knowledge Guided Attention and Inference for Describing Images Containing Unseen Objects Institute AIFB

Recommend


More recommend