Natural Language for Visual Reasoning Alane Suhr, Mike Lewis, James - PowerPoint PPT Presentation

Natural Language for Visual Reasoning Alane Suhr, Mike Lewis, James Yeh, Yoav Artzi lic.nlp.cornell.edu/nlvr/

Language and Vision A small herd of cows in a large What is the dog carrying? grassy field. (Agrawal et al 2015) (Chen et al 2015) Our goal: natural language with a diverse set of semantic and syntactic phenomenon

Natural Language for Visual Reasoning There is a box with 3 items of all 3 different colors. TRUE Task: determine whether the statement is true or false for the image.

Outline • Task and environments • Data collection • Analysis • Baselines

Task and Environments Scatter There is a box with 3 items of all 3 different colors. TRUE Tower There are only two towers which has the same base color. FALSE

Data collection • Goal: collect natural language descriptions of images and true/false judgments • Generate images • Collect natural language sentences • Validate image/sentence pairs

Image Generation

Image Generation • Randomly choose number of items per box and item shapes, colors, sizes, and positions (without overlap)

Image Generation • Randomly choose number of items per box and item shapes, colors, sizes, and positions (without overlap) • Construct second image with the same type

Image Generation • Randomly choose number of items per box and item shapes, colors, sizes, and positions (without overlap) • Construct second image with the same type • Construct third image by shuffling items in the first image

Image Generation • Randomly choose number of items per box and item shapes, colors, sizes, and positions (without overlap) • Construct second image with the same type • Construct third image by shuffling items in the first image • Construct fourth image by shuffling items in the second image Generate two unique images and permute their items to create two other images

Sentence Writing Write a sentence that is true about the top two images and false about the bottom two. • Don’t refer to the order of the images. • Don’t refer to the order of the boxes. There is a box with 3 items There is a box with 3 items There is a box with 3 items There is a box with 3 items of all 3 different colors. of all 3 different colors. of all 3 different colors. of all 3 different colors. Setup encourages set reasoning, counting, and comparisons

Sentence Writing There is a box with 3 items of all 3 different colors. TRUE There is a box with 3 items of all 3 different colors. TRUE There is a box with 3 items of all 3 different colors. FALSE There is a box with 3 items of all 3 different colors. FALSE

Validation There is a box with 3 items of all 3 different colors. • Higher-quality data • Measure agreement • Make sure sentences follow the guidelines   Fleiss’ κ : 0.709 ➡ 0.808

Validation There is a box with 3 items of all 3 different colors. ☐ TRUE ☑︎ FALSE

Permutation ☐ TRUE There is a box with 3 items of all 3 different colors. ☑︎ FALSE

Corpus Statistics • 92,244 examples • Four data splits • 80.7% training • 3,962 unique sentences • 6.4% development • Krippendorff’s α : 0.831 • 6.4% public test • Fleiss’ κ : 0.808 • (Landis and Koch, 1977) • 6.4% unreleased test • 262 words in the vocabulary • Average sentence length of 11.2 lic.nlp.cornell.edu/nlvr

Related Corpora Task Examples A small herd of Caption MSCOCO cows in a large generation (Chen et al 2015) grassy field. How many objects are Question CLEVR either small cylinders answering or red things? (Johnson et al 2016) What is the dog Question VQA — real carrying? answering (Agrawal et al 2015) Question Is this a forest? VQA — abstract answering (Agrawal et al 2015) there are exactly three Binary NLVR blue objects not classification (Suhr et al 2017) touching any edge

Related Corpora Natural Task Real images? language? ✔ ✔ Caption MSCOCO generation (Chen et al 2015) ✗ ✗ Question CLEVR answering (Johnson et al 2016) ✔ ✔ Question VQA — real answering (Agrawal et al 2015) ✗ ✔ Question VQA — abstract answering (Agrawal et al 2015) ✗ ✔ Binary NLVR classification (Suhr et al 2017)

Lengths VQA real images VQA abstract images NLVR (ours) MSCOCO CLEVR 30 24 18 12 6 0 1 6 11 16 21 26 31 36 41 Longer than VQA Similar to MS COCO

Linguistic Analysis Analyzed 200 random development sentences. VQA (abstract) VQA (real) NLVR Soft cardinality Hard cardinality Coordination Negation Existential Universal quantifiers quantifiers Coreference Presupposition Prepositional Coordination ambiguity Comparisons ambiguity Spatial relations

Numerical Expressions Hard cardinality 66% 12% 12% There is a tower with exactly three blocks, and it has a yellow block and two blue blocks. TRUE Soft cardinality 16% 1% 0% there are at least two yellow VQA (abstract) squares not touching any edge VQA (real) NLVR TRUE

Negation and Coordination Negation 10% 1% 0% There is a box with a black item between 2 items of the same color and no item on top of that. TRUE Coordination 17% 5% 3% There is a box with a yellow item and VQA (abstract) VQA (real) three black items. NLVR TRUE

Baselines Accuracy on unreleased test set 62.0 56.3 56.2 55.4 55.3 Majority class Text   Image   CNN+RNN NMN only   only   (Andreas et al 2015) (RNN) (CNN)

Feature-based Analysis • Features text and structured representation • Use maximum entropy model Accuracy 68.04 67.82 57.7 Unreleased test Dev No count features

http://lic.nlp.cornell.edu/nlvr/ Thank you!

Natural Language for Visual Reasoning Alane Suhr, Mike Lewis, James - PowerPoint PPT Presentation

Natural Language for Visual Reasoning Alane Suhr, Mike Lewis, James Yeh, Yoav Artzi lic.nlp.cornell.edu/nlvr/ Language and Vision A small herd of cows in a large What is the dog carrying? grassy field. (Agrawal et al 2015) (Chen et al 2015)

A Corpus of Natural Language for Visual Reasoning Cornell Natural Language Visual Reasoning

Visual Analytics Visual Analytics is the science of analytical reasoning supported by interactive

Automated Reasoning Course Presentation Summary Automated Reasoning Motivations Course Plan

Evidential and Causal Reasoning Much reasoning in AI can be seen as evidential reasoning ,

Visual Question Answering and Visual Reasoning Zhe Gan 6/15/2020 Overview Goal of this part

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

NEURO-SYMBOLIC VISUAL REASONING: DISENTANGLING VISUAL FROM REASONING HAMID PALANGI

SECTION 1: Introductions Code Reasoning Forward Reasoning CODE REASONING +

Probabilistic Reasoning; Probabilistic Reasoning; Network-based reasoning Network-based

CHAPTER-4 1 LOGIC AND REASONING ! Knowledge and ! Reasoning in Knowledge- Reasoning Based

Data-Driven and Ontological Analysis of FrameNet for Natural Language Reasoning for Natural

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Exploring the Unknown Universe Daniel Whiteson, UC Irvine Motivation The Standard Model Can

Welcome Today: Basic overview of the course and objectives Goal: Thing are much

Welcome Today: Basic overview of the course and objectives CS1007: Object Oriented

SUBTRACTION-FREE COMPLEXITY, CLUSTER TRANSFORMATIONS, AND SPANNING TREES SERGEY FOMIN, DIMA

Monte-Carlo Simulations and Applications in High Energy and Particle Physics HPP Altan CAKIR

GPGPU 2015: High Performance Tutorial contents for today [118 slides] Computing with CUDA

Machine Learning: Dynamics, Economics and Stochastics Michael I. Jordan University of

Chapter 1 Introduction Chapter Scope Introduce the Java programming language Program