learning to compose neural networks for question answering
play

Learning to Compose Neural Networks for Question Answering Andreas, - PowerPoint PPT Presentation

Learning to Compose Neural Networks for Question Answering Andreas, Rohrbach, Darrell, and Klein Garrett Bingham Outline Approach Module Inventory Network Layout Experiments Conclusions Critiques 2 Approach


  1. Learning to Compose Neural Networks for Question Answering Andreas, Rohrbach, Darrell, and Klein Garrett Bingham

  2. Outline Approach ● Module Inventory ● Network Layout ● Experiments ● Conclusions ● Critiques ● 2

  3. Approach Dynamically assembled neural networks answer queries about images and structured knowledge bases. 3

  4. Module Inventory 4

  5. Module Base Types Attention - A distribution over pixels or entities Labels - A distribution over answers 5

  6. Lookup ( → Attention) (1 of 6) Lookup produces attention focused at index f(i) : 6

  7. Find ( → Attention) (2 of 6) Find computes a distribution over indices with a MLP. 7

  8. Relate (Attention → Attention) (3 of 6) Relate directs focus from one region of the input to another. It is similar to the find module, but it conditions its behavior on the current attention. 8

  9. And (Attention → Attention) (4 of 6) And is analogous to set intersection for attentions. The and module computes elementwise multiplication. 9

  10. Describe (Attention → Labels) (5 of 6) Describe uses input attention to predict an output label. 10

  11. Exists (Attention → Labels) (6 of 6) Exists produces a label directly from the input attention. It does not use an intermediate feature vector like describe . 11

  12. Network Layout 12

  13. Question → Dependency Parse (1 of 4) 13

  14. Parse → Layout Fragments (2 of 4) Nouns, verbs → find Proper nouns → lookup Prepositional phrases → relate preposition, { find noun, lookup proper noun} 14

  15. Fragments → Layout Candidates (3 of 4) For each subset of fragments: → Join all fragments with and → Insert exists or describe at the top 15

  16. Candidates → Final Network (4 of 4) Network selected with policy gradient method. Once network is chosen, it is trained with standard backpropagation. Modules have global weights shared across all instances of the module, but not shared with other modules. 16

  17. Experiments 17

  18. Visual Question Answering (1 of 2) 200,000+ images and human-annotated questions and answers. Only describe , and, and find modules are used. 18

  19. VQA - Results (1 of 2) SOTA results, outperforming: Visual bag-of-words model ● Dynamic parameter prediction model (fixed architecture) ● Conventional attention model ● Previous neural module networks without structure ● prediction 19

  20. GeoQA (2 of 2) 263 examples Entities (states, cities, parks), relations (north-of, capital-of) GeoQA+Q distinguishes between: What cities are in Texas? → Austin Are there any cities in Texas? → Yes while GeoQA does not. 20

  21. GeoQA - Results (2 of 2) Dynamic model outperforms: Logical baseline (LSP-F) ● Perceptual baseline (LSP-W) ● Fixed-structure neural ● module network (NMN) Demonstrates that D-NMN can outperform logical baselines and perform well on diverse datasets. 21

  22. Conclusions and Critiques 22

  23. Conclusions Given (question, world, answer) triples, model learns to assemble neural networks on the fly. Model answers queries about both structured and unstructured information. SOTA on VQA and GeoQA+Q datasets. 23

  24. Critique 1 - Discarding modules Only describe , and , and find were used for VQA: (describe (and ( find [ all nouns in sentence ] ) ) ) Why introduce modules and then not use them? We have to conclude that lookup , relate , and exists hurt performance. Are static networks better than dynamic ones? Is the RL network constructor agent not effective? 24

  25. Critique 2 - No RL Agent Baselines How do we know the RL agent is effective at constructing networks? What about a random network? Why not construct a network by and -ing all of the layout candidates together? 25

  26. Critique 3 - Nitpicks What is the motivation for these modules specifically? What is a measure module? And , but no or ? 26

  27. Questions?

Recommend


More recommend