Ph.D. Student Machine Learning and Perception Lab 2 sky stop - PowerPoint PPT Presentation

Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab

sky stop light building bus car person sidewalk Identify objects in scene 3

blue green tall sky stop light building many red cars bus one bicycle Identify attributes of objects 4

man walking on sidewalk person wearing a helmet riding bicycle Identify activities in scene 5

street scene Identify the scene 6

A person on bike going through green light with bus nearby Describe the scene 8

A giraffe standing in the grass next to a tree. 11

• Answer questions about the scene – Q: How many buses are there? – Q: What is the name of the street? – Q: Is the man on bicycle wearing a helmet? 13

Visual Question Answering (VQA) Task: Given an image and a natural language open- ended question, generate a natural language answer. 15

VQA Task 16

VQA CloudCV Demo cloudcv.org/vqa/?useVoice=1&listenAnswer=1 17

Applications of VQA • An aid to visually-impaired Is it safe to cross the street now? 18

Applications of VQA • Surveillance What kind of car did the man in red shirt leave in? 19

Applications of VQA • Interacting with robot Is my laptop in my bedroom upstairs? 20

VQA Dataset 21

Real images (from MSCOCO) Tsung-Yi Lin et al. “Microsoft COCO: Common Objects in COntext .” ECCV 2014. http://mscoco.org/ 22

Questions Stump a smart robot! Ask a question that a human can answer, but a smart robot probably can’t! 23

Two modalities of answering • Open Ended • Multiple Choice 24

Open Ended Task What is the girl holding in her hand? How many mirrors? Why is the girl holding an umbrella? 25

Multiple Choice Task What is the bus number? a) 3 b) 1 c) green d) 4 e) window trim f) blue g) m5 h) corn, carrots, onions, rice i) red j) 125 k) san antonio l) sign pen m) 478 n) no o) 25 p) 2 q) yes r) white 26

Dataset Stats • >250K images (MSCOCO + 50K Abstract Scenes) • >750K questions (3 per image) • ~10M answers (10 w/ image + 3 w/o image) 27

Please visit www.visualqa.org for more details. 28

Browse the Dataset http://visualqa.org/browser/ 29

Questions 30

Dataset Visualization http://visualqa.org/visualize/ 32

Answers • 38.4% of questions are binary yes/no • 98.97% questions have answers <= 3 words – 23k unique 1 word answers 33

Answers 34

2-Channel VQA Model Neural Network Image Embedding Softmax over top K answers 4096-dim Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Embedding Question “How many horses are in this image?” 1024-dim 36

Ablation #1: Language-alone Neural Network Image Embedding Softmax 1k output over top K answers units Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Question Embedding “How many horses are in this image?” 1024-dim 37

Ablation #2: Vision-alone Neural Network Image Embedding Softmax over top K answers 4096-dim Convolution Layer Pooling Layer Convolution Layer Pooling Layer Fully-Connected MLP + Non-Linearity + Non-Linearity Question Embedding “How many horses are in this image?” 38

Accuracy Metric 39

Open-Ended Task Accuracies Human performance Human vs. Machine performance room for 25.14 improvement Human Machine 40

Results • Multiple-Choice > Open-Ended • Question alone does quite well Code available! • Image helps 41

Commonsense • Does this person have 20/20 vision? 42

Does this question need commonsense? Q: How many calories are in this pizza? 43

How old does a person need to be? Q: How many calories are in this pizza? 44

Most “commonsense” questions 45

Least “commonsense” questions 46

Spectrum 3-4 (15.3%) 5-8 (39.7%) 9-12 (28.4%) 13-17 (11.2%) 18+ (5.5%) Is that a bird in the sky? How many pizzas are shown? Where was this picture taken? Is he likely to get mugged if he walked What type of architecture is this? down a dark alleyway like this? What color is the shoe? What are the sheep eating? What ceremony does the cake Is this a vegetarian meal? Is this a Flemish bricklaying commemorate? pattern? How many zebras are there? What color is his hair? Are these boats too tall to fit What type of beverage is in the glass? How many calories are in this under the bridge? pizza? Is there food on the table? What sport is being played? What is the name of the white Can you name the performer in the What government document is shape under the batter? purple costume? needed to partake in this activity? Is this man wearing shoes? Name one ingredient in the skillet. Is this at the stadium? Besides these humans, what other What is the make and model of animals eat here? this vehicle? 47

Question Average Age what brand 12.5 why 11.18 what type 11.04 what kind 10.55 is this 10.13 what does 10.06 what time 9.81 who 9.58 where 9.54 which 9.32 does 9.29 do 9.23 what is 9.11 what are 9.04 are 8.65 is the 8.52 is there 8.24 what sport 8.06 how many 7.67 what animal 6.74 what color 6.6 48

VQA Age • Average “age of questions” = 8.98 years. • Our model =* 4.74 years old! * age as estimated by untrained crowd-sourced workers 49

VQA Common sense • Average common sense required = 31%. • Our best algorithm has* 17% common sense! * as estimated by untrained crowd-sourced workers 50

VQA Challenges on www.codalab.org 51

VQA Challenge @ CVPR16 52

VQA Challenge @ CVPR16 code available! 53

VQA Workshop @ CVPR16 54

Papers using VQA … and many more 55

Dataset: >1k downloads Code: >1.5k views Academia, industry, start ups 56

Conclusions • VQA: Visual Question Answering – The next “grand challenge” in vision, language, AI • Spectrum: Easy to Difficult – “What room is this?”  Scene Recognition – “How many …”  Object Recognition – … – “Does this person have 20/20 vision”  Common sense • Exciting times ahead! 57

VQA Team Jiasen Lu Akrit Mohapatra Aishwarya Agrawal Stanislaw Antol Virginia Tech Virginia Tech Virginia Tech Virginia Tech Webmaster Meg Mitchell Larry Zitnick Dhruv Batra Devi Parikh Microsoft Research Facebook AI Virginia Tech Virginia Tech Research 58

Closing Remarks • CloudCV VQA Exhibition: Booth 101 • Contact email: aish@vt.edu • Please complete the Presenter Evaluation sent to you by email or through the GTC Mobile App. Your feedback is important! 59

Thanks! Questions? 60

Visual Question Answering (VQA) 61

Ph.D. Student Machine Learning and Perception Lab 2 sky stop - PowerPoint PPT Presentation

Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab 2 sky stop light building bus car person sidewalk Identify objects in scene 3 blue green tall sky stop light building many red cars bus one bicycle Identify

New Student Welcome Day will begin shortly. New Student Welcome Day 1 New Student Welcome Day

Student by Student Student by Student Technology & Leadership Conference Participating

Why NSE for your student? Allows student to take classes and study at a different member

Code of Student Conduct Revision Graduate Student Senate Student Government Association Student

Student Services Personal Tutor Training: Pastoral Student Services Student Services Student

WOU Student Media Rhys s Finch ch, MBA Student Media Adviser Student Media is Northw

Student Technology Consultants 85 100 student staff 5 Technology Coordinators

Student Transportation Student Transportation Student Transportation Student Transportation Bus

Student Accounts Presentation Student Accounts Student Accounts Our mission: Operate

Student Learning Objectives Definition of Student Growth For the purpose of use in Ohios

Presented By: Sharon Armour Student Class 2 STUDENT CLASS IS A COLLECTION OF STUDENT CLASSES

Presented By: Sharon Armour Student Class 2 STUDENT CLASS IS A COLLECTION OF STUDENT CLASSES

Student Rollover 1 Student Rollover 1896 2 Student Rollover 1. much easier to use, 2. Provides

Introductions Jessica Hickernell JLD Coordinator/Assistant Director Student

Student Employment/Payroll Informational Session Student Employment Payroll Kim Duby Jess

Managing Your Student Account Presented by: Student Financial Assistance and Student Account

ACADEMIC REGULATIONS CATALOG OF RECORD Students have the option of graduating under the

Juicing Diet Student D, Student E, Student F Background The juicing diet started to get

S T U D E N T E X P E R I E N C E WE LCOM E TA LK SPEAKERS . C h r i s V i d l e r Student

Formula Student Overview for 2014-2015 Carleton Formula Student What is Formula Student?

PARTNERING FOR STUDENT SUCCESS DIVISION OF STUDENT AFFAIRS SUPPORTING STUDENT SUCCESS We are

Student T.E.A.M.S. Student Teams Evaluating, Analyzing, Measuring Systems Dr. Deb Oliver,

The Student Visa Subclass 500 Session plan Simplified Student Visa Framework Student visa

Student Involvement at EOU: Results of the Student

Ph.D. Student Machine Learning and Perception Lab 2 sky stop - PowerPoint PPT Presentation

Aishwarya Agrawal Ph.D. Student Machine Learning and Perception Lab 2 sky stop light building bus car person sidewalk Identify objects in scene 3 blue green tall sky stop light building many red cars bus one bicycle Identify

New Student Welcome Day will begin shortly. New Student Welcome Day 1 New Student Welcome Day

Student by Student Student by Student Technology &amp; Leadership Conference Participating

Why NSE for your student? Allows student to take classes and study at a different member

Code of Student Conduct Revision Graduate Student Senate Student Government Association Student

Student Services Personal Tutor Training: Pastoral Student Services Student Services Student

WOU Student Media Rhys s Finch ch, MBA Student Media Adviser Student Media is Northw

Student Technology Consultants 85 100 student staff 5 Technology Coordinators

Student Transportation Student Transportation Student Transportation Student Transportation Bus

Student Accounts Presentation Student Accounts Student Accounts Our mission: Operate

Student Learning Objectives Definition of Student Growth For the purpose of use in Ohios

Presented By: Sharon Armour Student Class 2 STUDENT CLASS IS A COLLECTION OF STUDENT CLASSES

Presented By: Sharon Armour Student Class 2 STUDENT CLASS IS A COLLECTION OF STUDENT CLASSES

Student Rollover 1 Student Rollover 1896 2 Student Rollover 1. much easier to use, 2. Provides

Introductions Jessica Hickernell JLD Coordinator/Assistant Director Student

Student Employment/Payroll Informational Session Student Employment Payroll Kim Duby Jess

Managing Your Student Account Presented by: Student Financial Assistance and Student Account

ACADEMIC REGULATIONS CATALOG OF RECORD Students have the option of graduating under the

Juicing Diet Student D, Student E, Student F Background The juicing diet started to get

S T U D E N T E X P E R I E N C E WE LCOM E TA LK SPEAKERS . C h r i s V i d l e r Student

Formula Student Overview for 2014-2015 Carleton Formula Student What is Formula Student?

PARTNERING FOR STUDENT SUCCESS DIVISION OF STUDENT AFFAIRS SUPPORTING STUDENT SUCCESS We are

Student T.E.A.M.S. Student Teams Evaluating, Analyzing, Measuring Systems Dr. Deb Oliver,

The Student Visa Subclass 500 Session plan Simplified Student Visa Framework Student visa

Student Involvement at EOU: Results of the Student

Student by Student Student by Student Technology & Leadership Conference Participating