Ask Your Neurons: A Neural-based Approach to Answering Questions - PowerPoint PPT Presentation

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images Mateusz Malinowski [1] Marcus Rohrbach [2] Mario Fritz [1] [1] Max Planck Institute for Informatics [2] Berkeley University of California, ICSI

Human-like Comprehension 011101011100 Is the water 6 = 1011000100100 boiling? 010011110000 • How far are machines from human quality understanding? • How can we monitor progress and evaluate architectures? 2 M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Visual Turing Test (NIPS’14) • Holistic, open-ended task ‣ Visual scene understanding ‣ Natural language understanding ‣ Deduction • No internal representation is evaluated ‣ Challenge is open to diverse approaches • Scalable annotation end evaluation effort What is behind the table? sofa ‣ Only question-answer pairs What is on the refrigerator? How many lamps are there? What color are the cabinets? magnet, paper 2 brown 3 M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

        Related Work • Symbolic-based Approaches   chair(1, brown, position X, Y, Z) window(1, blue, position X, Y, Z) M. Malinowski et. al. Multiworld. NIPS’14 window What …? λ x . Behind ( x , Table ) • Large Scale Datasets   S. Antol et. al. Visual QA. ICCV’15 L. Yu et. al. al. Visual Madlibs. ICCV’15   D. Geman et. al. Visual Turing Test. PNAS’15   M. Ren et. al. Image QA. NIPS15 H. Gao et. al. Are You Talking to a Machine? NIPS’15 What is the mustache Person A is … Y. Zhu et. al. Visual7W. arXiv’15 made of? L. Zhu et. al. Uncovering Temporal Context. arXiv’15 ... What is the cat doing ? <BOA> Sitting on the umbrella .21 .56 .09 .01 ... Shared One Two Red Bird • Neural-based Approaches   ... Softmax Embedding Shared LSTM LSTM M. Ren et. al. Image QA. NIPS’15 Fusing Image Word Embedding Linear Intermediate H. Gao. et. al. Are You Talking to a Machine? NIPS’15 CNN “many” CNN “How” “books” Softmax Sitting on the umbrella <EOA> L. Ma et. al. Learning to Answer Questions From Images. arXiv’15 feature vectors of di ff erent • Attention-based Approaches   parts of image A B cat CNN Z. Yang. et. al. Stacked Attention Networks. arXiv’15 cake Query Y. Zhu et. al. Visual7W. arXiv’15 Question: Answer: Softmax What are sitting + + CNN/ dogs J. Andres et. al. Deep Compositional QA. arXiv’15 in the basket on LSTM a bicycle? H. Xu et. al. Ask, Attend and Answer. arXiv’15 Attention layer 1 Attention layer 2 What kind of animal is in the photo? Why is the person holding a knife? K. Chen et. al. ABC-CNN. arXiv’15 A cat . To cut the cake with. Where is K. J. Shih et. al. Where To Look. arXiv’15 LSTM couch the dog? C D count where color ... Parser Layout • Hybrid Approaches - dog cat standing ... H. Noh et al. Dynamic Parameter Prediction. arXiv’15 CNN J. Andres et al. Deep Compositional QA. arXiv’15 Where are the carrots? How many people are there? At the top. Three. 4 M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

              Outline • Neural approach to answer questions about images   CNN table ? What is behind the LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM chair window <end> • Performance metrics based on additional annotations What is the object on the floor in front of the wall?   -. Human 1: bed Human 2: shelf   Human 3: bed Human 4: bookshelf 5 M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Method: Ask Your Neurons CNN is table the ? What behind LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM <end> window chairs 6 M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Method: Ask Your Neurons CNN x q n-1 q n a t-1 ... ... ... LSTM LSTM LSTM ... a 1 a t • Predicting answer sequence ‣ Recursive formulation p ( a | x , q , ˆ a | x , - image representation a t = arg max A t � 1 ; θ ) , ˆ a 2 V ⇥ ⇤ i.e. q = q 1 , . . . , q n � 1 , J ? K , q j - question word index , problem tion and J K ord ques- encodes where ˆ ulary V - vocabulary, - previous answer words A t � 1 = { ˆ a t − 1 } a 1 , . . . , ˆ ˆ of the 7 M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Symbolic vs Neural-based Approaches Symbolic approach (NIPS’14) • Explicit representation ‣ Independent components ‣ - Detectors, Semantic Parser,   Database Components trained separately ‣ Many ‘hard’ design decisions ‣ Knowledge base chairs,   What is behind   λ x . Behind ( x , Table ) window the table ? Logical Representation M. Malinowski, et. al. “A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input”. NIPS’14 11 M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Symbolic vs Neural-based Approaches Symbolic approach (NIPS’14) Ask Your Neurons (Our) • • Explicit representation Implicit representation ‣ ‣ Independent components End-to-end formula ‣ ‣ - - Detectors, Semantic Parser,   From images and questions to Database answers Components trained separately Joint training ‣ ‣ Many ‘hard’ design decisions Fewer design decisions ‣ ‣ CNN Knowledge base What ? is … LSTM LSTM LSTM LSTM LSTM LSTM chairs,   What is behind   λ x . Behind ( x , Table ) window the table ? <end> chairs window Logical Representation End-to-end, jointly trained architecture M. Malinowski, et. al. “A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input”. NIPS’14 12 M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Neural Visual QA vs Neural Image Description Neural Image Description • Conditions on an image   ‣ Generates a description ‣ - Sequence of words Loss at every step ‣ CNN LSTM LSTM LSTM LSTM LSTM LSTM Large building with a clock <end> Loss J. Donahue, et. al. “Long-term Recurrent Convolutional Networks for Visual Recognition and Description”. CVPR15 13 M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Neural Visual QA vs Neural Image Description Neural Image Description Ask Your Neurons (Our) • • Conditions on an image   Conditions on an image   ‣ ‣ and a question Generates a description Generates an answer ‣ ‣ - - Sequence of words Sequence of answer words Loss at every step Loss only at answer words ‣ ‣ CNN CNN What ? is … LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM Large building with a clock <end> <end> chairs window Loss Loss J. Donahue, et. al. “Long-term Recurrent Convolutional Networks for Visual Recognition and Description”. CVPR15 14 M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Ask Your Neurons: A Neural-based Approach to Answering Questions - PowerPoint PPT Presentation

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images Mateusz Malinowski [1] Marcus Rohrbach [2] Mario Fritz [1] [1] Max Planck Institute for Informatics [2] Berkeley University of California, ICSI Human-like

Neural Networks Overview CS89.11/189.2 - Spring 2020 Our Neurons Our Neurons Dendrites Our

NEURAL NETWORKS NEURAL NETWORKS THE IDEA BEHIND ARTIFICIAL NEURONS Initially a simplified

Mirror neurons Mirror neurons (MNs) = sub-populations of motor neurons that discharge both

Vi t Virtual Neurons l N 3D reconstructions of neurons 3D-reconstructions of neurons Manos

Neural Circuits Underlie Brain Function interneuron inter- neuron pyramidal neurons Neural

ASK C o r p o r a t i o n ASK Corporation American ADM, Inc. ASK 1 C o r p o r a t i o n Ask

Evolving adaptive coincidence-detecting neurons W. Garrett Mitchener College of Charleston

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Paper Reading Jun Gao June 26, 2018 Tencent AI Lab Neural Generative Question Answering

Learning to compose neural networks for ques5on answering Jacob Andreas, Marcus Rohrbach,

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Nuclear localization of Cdk5 is a key determinant in the postmitotic state of neurons of neurons

The cable equation A.K.A. the monodomain model Neurons Electric flow in neurons The neuron

The Autonomic Nervous System and Visceral Sensory Neurons The Autonomic Nervous System and Visceral

Neural Networks These representations are inspired by neurons and their connections in the

Introduction to OpenRefine Owen Stephens Felix Lohmeier Using these slides These slides were

Ask an Electric Vehicle Driver! Earth Week 2020 50 th Anniversary of Earth Day Sponsored by the

Using Users Paul Querna paul.querna@ask.com What is Bloglines? Blog & Feed Reader First

Strategies for Asking To access captioning, click on captions show subtitles . REALD

CS290N Summary 2015 Tao Yang Text books [CMS] Bruce Croft, Donald Metzler, Trevor Strohman,

Use of Click Data for Web Search Tao Yang UCSB 290N Table of Content Search Engine Logs

Systems & Applications: Introduction Ling 573 NLP Systems and Applications April 1, 2014

Improving and Proving Marketing ROI with Testing How Shoebuy.com uses cross-site testing to

Ask Your Neurons: A Neural-based Approach to Answering Questions - PowerPoint PPT Presentation

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images Mateusz Malinowski [1] Marcus Rohrbach [2] Mario Fritz [1] [1] Max Planck Institute for Informatics [2] Berkeley University of California, ICSI Human-like

Neural Networks Overview CS89.11/189.2 - Spring 2020 Our Neurons Our Neurons Dendrites Our

NEURAL NETWORKS NEURAL NETWORKS THE IDEA BEHIND ARTIFICIAL NEURONS Initially a simplified

Mirror neurons Mirror neurons (MNs) = sub-populations of motor neurons that discharge both

Vi t Virtual Neurons l N 3D reconstructions of neurons 3D-reconstructions of neurons Manos

Neural Circuits Underlie Brain Function interneuron inter- neuron pyramidal neurons Neural

ASK C o r p o r a t i o n ASK Corporation American ADM, Inc. ASK 1 C o r p o r a t i o n Ask

Evolving adaptive coincidence-detecting neurons W. Garrett Mitchener College of Charleston

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Paper Reading Jun Gao June 26, 2018 Tencent AI Lab Neural Generative Question Answering

Learning to compose neural networks for ques5on answering Jacob Andreas, Marcus Rohrbach,

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Nuclear localization of Cdk5 is a key determinant in the postmitotic state of neurons of neurons

The cable equation A.K.A. the monodomain model Neurons Electric flow in neurons The neuron

The Autonomic Nervous System and Visceral Sensory Neurons The Autonomic Nervous System and Visceral

Neural Networks These representations are inspired by neurons and their connections in the

Introduction to OpenRefine Owen Stephens Felix Lohmeier Using these slides These slides were

Ask an Electric Vehicle Driver! Earth Week 2020 50 th Anniversary of Earth Day Sponsored by the

Using Users Paul Querna paul.querna@ask.com What is Bloglines? Blog &amp; Feed Reader First

Strategies for Asking To access captioning, click on captions show subtitles . REALD

CS290N Summary 2015 Tao Yang Text books [CMS] Bruce Croft, Donald Metzler, Trevor Strohman,

Use of Click Data for Web Search Tao Yang UCSB 290N Table of Content Search Engine Logs

Systems &amp; Applications: Introduction Ling 573 NLP Systems and Applications April 1, 2014

Improving and Proving Marketing ROI with Testing How Shoebuy.com uses cross-site testing to

Using Users Paul Querna paul.querna@ask.com What is Bloglines? Blog & Feed Reader First

Systems & Applications: Introduction Ling 573 NLP Systems and Applications April 1, 2014