Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das - PowerPoint PPT Presentation

Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das PhD student, Georgia Tech

Embodied Question Answering Samyak Datta Georgia Gkioxari Stefan Lee Devi Parikh Dhruv Batra Georgia Tech FAIR Georgia Tech FAIR/Georgia Tech FAIR/Georgia Tech em embodied edqa. a.org/pa pape per.pd pdf To To appear in CVPR PR 2018 (Oral).

Forward

Turn Left

What is to the left of the shower? Cabinet Slide credit: Devi Parikh

EmbodiedQA: AI Challenges • Language understanding • Visual understanding • Active perception • Common sense reasoning • Grounding into actions • Selective memory • Credit assignment Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language Single-Shot QA Single Frame Video Vision Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language Single-Shot QA P a s s i v e A c t i o n A c t i v e Single Frame Video Vision Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language Single-Shot QA VQA P a s s i v e Q. What is the mustache made of? A c t i o n A [Antol and Agrawal et al., ICCV 2015] c t i v [Malinowski et al., ICCV 2015] e … Single Frame Video Vision Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language Single-Shot QA Q. How many times does the cat touch the dog? A. 4 times [Jang et al., CVPR 2017] VQA VideoQA P a s s i v e A c t i o Attribute: “dog”, “egg”, “bowl”, “woman”, “plate” n A c Q. What is a woman boiling in a pot of water? t i v e A. Eggs Single Frame Video [Ye et al., SIGIR 2017] Vision [Tapaswi et al., CVPR 2016] … Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language Visual Dialog Single-Shot QA VQA VideoQA P a s s i v e A c t i o n A c t i v e [Das et al., CVPR 2017] Single Frame Video [Das and Kottur et al., ICCV 2017] Vision … Slide credit: Devi Parikh

EmbodiedQA: Context Dialog Language • Goal specified via reward Visual Dialog • e.g., [Gupta et al., CVPR17, Zhu et al., ICCV17] Single-Shot QA • Goal specified via visual target • e.g., [Zhu et al., ICRA17] • Fully observable environment • e.g., [Wang et al., ACL16] VQA VideoQA • Recent P a s s i • [Hermann et al., 2017, Chaplot et al., 2017] v e A Embodied • More complex environments c t i QA o • Higher level tasks n A c • [Anderson et al., CVPR18] t i v e • Interactive downstream tasks Single Frame Video Vision Slide credit: Devi Parikh

EQA Dataset • Questions in environments Slide credit: Devi Parikh

EQA Dataset • Questions in en envi vironmen ents ts Slide credit: Devi Parikh

EQA Dataset: Environments House3D: A Rich and Realistic 3D environment https://github.com/facebookresearch/House3D Georgia Gkioxari Yuandong Tian Yuxin Wu Yi Wu UC Berkeley Facebook AI Research Slide credit: Georgia Gkioxari

SUNCG dataset [Song et al., CVPR 2017] Manually designed using an online interior design interface (Planner5D) Slide credit: Georgia Gkioxari

45,622 indoor scenes 5,697,217 object instances 404,058 rooms 2644 unique objects 80 object categories SUNCG dataset [Song et al., CVPR 2017] Manually designed using an online interior design interface (Planner5D) Slide credit: Georgia Gkioxari

• Collision and free space prediction • On Tesla M40 GPU (120x90 resolution) OpenGL 600fps single process • • Linux/MacOS compatible 1800fps multi process • • House3D Slide credit: Georgia Gkioxari

RGB image Depth maps Semantic segmentation masks Top-down 2D views House3D Slide credit: Georgia Gkioxari

EQA Dataset: Environments • Subset of House 3D: Typical home environments • Realistic layout according to all three SUNCG annotators • Not too large or too small (300-800m 2 , cover 1/3rd of ground area) • Have at least one kitchen, living room, dining room, bedroom • Ignore obscure rooms (e.g., loggia) and tiny objects (e.g., light switches) Slide credit: Devi Parikh

EQA Dataset: Environments Test for generalization to Homes (767): Rooms (12): Objects (50) train: 643 homes gym dining room rug piano dryer computer fireplace whiteboard bookshelf wardrobe cabinet novel environments! val: 67 homes patio living room pan toilet plates ottoman fish tank dishwasher microwave water dispenser test: 57 homes office bathroom bed table mirror tv stand stereo set chessboard playstation vacuum cleaner lobby bedroom cup xbox heater bathtub shoe rack range oven refrigerator coffee machine garage elevator sink sofa kettle dresser knife rack towel rack loudspeaker utensil holder kitchen balcony desk vase shower washer fruit bowl television dressing tab. cutting board ironing board food processor Slide credit: Devi Parikh

Slide credit: Devi Parikh fish tank piano pedestal fan candle air conditioner EQA Dataset: Environments bedroom kitchen living room

EQA Dataset • Questions in en envi vironmen ents ts Slide credit: Devi Parikh

EQA Dataset • Qu Ques esti tions in environments Slide credit: Devi Parikh

EQA Dataset: Questions • Programmatically generate questions and answers location: What room is the <OBJ> located in? color: What color is the <OBJ>? color_room: What color is the <OBJ> in the <ROOM>? preposition: What is <on/above/below/next-to> the <OBJ> in the <ROOM>? existence: Is there a(n) <OBJ> in the <ROOM>? logical: Is there a(n) <OBJ1> and a(n) <OBJ2> in the <ROOM>? count: How many <OBJs> in the <ROOM>? room_count: How many <ROOMs> in the house? distance: Is the <OBJ1> closer to the <OBJ2> than to the <OBJ3> in the <ROOM>? … Varying navigation and memory Skill combinations Slide credit: Devi Parikh

EQA Dataset: Questions • Programmatically generate questions and answers Slide credit: Devi Parikh

EQA Dataset: Questions • Programmatically generate questions and answers location: What room is the <OBJ> located in? EQA v1 color: What color is the <OBJ>? color_room: What color is the <OBJ> in the <ROOM>? preposition: What is <on/above/below/next-to> the <OBJ> in the <ROOM>? existence: Is there a(n) <OBJ> in the <ROOM>? logical: Is there a(n) <OBJ1> and a(n) <OBJ2> in the <ROOM>? count: How many <OBJs> in the <ROOM>? room_count: How many <ROOMs> in the house? distance: Is the <OBJ1> closer to the <OBJ2> than to the <OBJ3> in the <ROOM>? Slide credit: Devi Parikh

EQA Dataset: Questions • Programmatically generate questions and answers Remove questions with peaky answer location: What room is the <OBJ> located in? distributions EQA v1 color: What color is the <OBJ>? color_room: What color is the <OBJ> in the <ROOM>? preposition: What is <on/above/below/next-to> the <OBJ> in the <ROOM>? existence: Is there a(n) <OBJ> in the <ROOM>? logical: Is there a(n) <OBJ1> and a(n) <OBJ2> in the <ROOM>? count: How many <OBJs> in the <ROOM>? room_count: How many <ROOMs> in the house? distance: Is the <OBJ1> closer to the <OBJ2> than to the <OBJ3> in the <ROOM>? Questions (5281) train: 4246 val: 506 test: 529 Slide credit: Devi Parikh

EQA Dataset: Expert Demonstrations • Connected House3D to Amazon Mechanical Turk Slide credit: Devi Parikh

EQA Dataset: Expert Demonstrations Slide credit: Devi Parikh

41 Slide credit: Devi Parikh

42 Slide credit: Devi Parikh

EQA Dataset: Expert Demonstrations • Connected House3D to Amazon Mechanical Turk • Currently: demonstrations for 1162 questions across 70 environments • Can be used for training • Learn how to explore • Capture human common sense • Can serve as a performance reference Slide credit: Devi Parikh

EQA Dataset: Expert Demonstrations • Connected House3D to Amazon Mechanical Turk • Currently: demonstrations for 1162 questions across 70 environments • Can be used for training • Learn how to explore • Capture human common sense • Can an serve as as a a performan ance reference (see pap aper) Slide credit: Devi Parikh

Model: Vision, Language, Navigation, Answering Slide credit: Devi Parikh

Model: sion , Language, Navigation, Answering Vi Visi Autoencoder 224 110 53 24 10 Segmentation 224 110 53 24 10 32 32 16 8 Conv_4 Conv_3 Conv_2 Conv_1 RGB Depth Encoder Slide credit: Devi Parikh

Model: Vision, La ge , Navigation, Answering Langu guage Slide credit: Devi Parikh

Model: Vision, Language, Na tion , Answering Navig vigatio • Planner: direction or intention • Controller: velocity or primitive actions Stop Repeat Slide credit: Devi Parikh

Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das - PowerPoint PPT Presentation

Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das PhD student, Georgia Tech Embodied Question Answering Samyak Datta Georgia Gkioxari Stefan Lee Devi Parikh Dhruv Batra Georgia Tech FAIR Georgia Tech FAIR/Georgia Tech

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

A Multilingual Hybrid Question-Answering System Cross-Lingual Open-Domain Question Answering

Question Answering and AnswerFinder Diego Moll a Centre for Language Technology Department of

Statistical NLP Spring 2011 Lecture 26: Question Answering Dan Klein UC Berkeley Question

Question Answering Statistical NLP Following largely from Chris Mannings slides, which

Designing deep architectures for Visual Question Answering Matthieu Cord Sorbonne University

Question Answering and Reading Comprehension Kevin Duh Fall 2019, Intro to HLT, Johns Hopkins

Towards End-to-End Reasoning for Question Answering Minjoon Seo Department of Computer Science

Question-Answering: Shallow & Deep Techniques for NLP Deep Processing Techniques for NLP

Factoid Question Answering Roy Aslan (ra2752@Columbia.edu) A Neural Network for Factoid

Question-Answering: Shallow & Deep Techniques for NLP Ling571 Deep Processing Techniques

Phrase-Indexed Question Answering : A New Challenge for Scalable Document Comprehension Minjoon

Question Classification in English-Chinese Cross-Language Question Answering: An Integrated

Self-Critical Reasoning for Robust Visual Question Answering Jialin Wu and Raymond J. Mooney

Question Answering Spring 2020 2020-04-02 Adapted from slides from Danqi Chen and Karthik

Chess Q&A : Question Answering on Chess Games Reasoning, Attention, Memory NIPS Workshop 12

CQARank:Jointly Model Topics and Expertise in Community Question Answering Liu Yang, Minghui Qiu,

An Question Recommendation System for Question Answer Community (Stackoverflow) Presenter: Haoyu

Question Answering Alexander Solovyev Bauman Moscow Sate Technical University a-soloviev@mail.ru

Neural Question Answering at BioASQ 5B Georg Wiese, Dirk Weissenborn, Mariana Neves Motivation

Question Answering & the Semantic Web Gnter Neumann Language Technology-Lab DFKI,

Paper Reading Jun Gao June 26, 2018 Tencent AI Lab Neural Generative Question Answering

Building a Smart Question Answering System from Scratch Minjoon Seo PhD Student University of

Question Answering over Freebase with Multi-Column Convolutional Neural Networks Li Dong 1 , Furu

Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das - PowerPoint PPT Presentation

Embodied Question Answering NVIDIA GTC March 26, 2018 Abhishek Das PhD student, Georgia Tech Embodied Question Answering Samyak Datta Georgia Gkioxari Stefan Lee Devi Parikh Dhruv Batra Georgia Tech FAIR Georgia Tech FAIR/Georgia Tech

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

A Multilingual Hybrid Question-Answering System Cross-Lingual Open-Domain Question Answering

Question Answering and AnswerFinder Diego Moll a Centre for Language Technology Department of

Statistical NLP Spring 2011 Lecture 26: Question Answering Dan Klein UC Berkeley Question

Question Answering Statistical NLP Following largely from Chris Mannings slides, which

Designing deep architectures for Visual Question Answering Matthieu Cord Sorbonne University

Question Answering and Reading Comprehension Kevin Duh Fall 2019, Intro to HLT, Johns Hopkins

Towards End-to-End Reasoning for Question Answering Minjoon Seo Department of Computer Science

Question-Answering: Shallow &amp; Deep Techniques for NLP Deep Processing Techniques for NLP

Factoid Question Answering Roy Aslan (ra2752@Columbia.edu) A Neural Network for Factoid

Question-Answering: Shallow &amp; Deep Techniques for NLP Ling571 Deep Processing Techniques

Phrase-Indexed Question Answering : A New Challenge for Scalable Document Comprehension Minjoon

Question Classification in English-Chinese Cross-Language Question Answering: An Integrated

Self-Critical Reasoning for Robust Visual Question Answering Jialin Wu and Raymond J. Mooney

Question Answering Spring 2020 2020-04-02 Adapted from slides from Danqi Chen and Karthik

Chess Q&amp;A : Question Answering on Chess Games Reasoning, Attention, Memory NIPS Workshop 12

CQARank:Jointly Model Topics and Expertise in Community Question Answering Liu Yang, Minghui Qiu,

An Question Recommendation System for Question Answer Community (Stackoverflow) Presenter: Haoyu

Question Answering Alexander Solovyev Bauman Moscow Sate Technical University a-soloviev@mail.ru

Neural Question Answering at BioASQ 5B Georg Wiese, Dirk Weissenborn, Mariana Neves Motivation

Question Answering &amp; the Semantic Web Gnter Neumann Language Technology-Lab DFKI,

Paper Reading Jun Gao June 26, 2018 Tencent AI Lab Neural Generative Question Answering

Building a Smart Question Answering System from Scratch Minjoon Seo PhD Student University of

Question Answering over Freebase with Multi-Column Convolutional Neural Networks Li Dong 1 , Furu

Question-Answering: Shallow & Deep Techniques for NLP Deep Processing Techniques for NLP

Question-Answering: Shallow & Deep Techniques for NLP Ling571 Deep Processing Techniques

Chess Q&A : Question Answering on Chess Games Reasoning, Attention, Memory NIPS Workshop 12

Question Answering & the Semantic Web Gnter Neumann Language Technology-Lab DFKI,