ReferItGame: Referring to Objects in Photographs of Natural Scenes

Motivation ● First large-scale referring expression dataset ● Reference expressions are the natural way people talk – Of psychological interest in the ‘70s; Grice, Rosch, Winograd ● Application to human-computer interaction, robots ● Introduce – A large-scale dataset of referring expressions – A benchmark model for generating referral expressions

Motivation ● Natural referring expressions are free-form – ‘smiling boy’; only subject – ‘man on left’; subject and preposition ● Other work requires expression as (subj, prep, obj) – ‘cat on the chair’

Dataset ● Build on SAIAPR TC-12 dataset with 238 object categories ● Visual features include segmentations with – absolute properties: area, boundary, width, height… – relative properties: adjacent, disjoint, beside, X-aligned, above…

Dataset ● Player 1 writes an expression referencing the segmented object ● Player 2 clicks on where that object should be – This verifies the expression is reasonable

Dataset ● Collected through Turkers and volunteers – ~130,000 expressions – ~100,000 distinct objects – ~20,000 photographs ● www.referitgame.com is down unfortunately

Dataset ● Parse expressions into 7-tuple set of attributes, R – entry-level category; ‘bird’ – color; ‘blue’ – size; ‘tiny’ – absolute location; ‘top of the image’ – relative location relation; ‘the car to the left of the tree’ – relative location object; ‘the car to the left of the tree’ – generic; ‘wooden’, ‘round’ ● The big old white cabin beside the tree – R = {cabin, white, big, Ø, beside, tree, old} ● StanfordCoreNLP parser and attribute template

Dataset ● Psychology analysis – ‘woman’ often replaced with ‘person’

Dataset ● Attribute use – Roughly half of parsed descriptions are just category

Model ● Optimize R over P and S using ILP – R is 7-tuple set of attributes – P is visual features of object being referred to – S is visual features of the scene ● Different hand-engineered distributions for different attributes ● Unary priors between attribute and object ● Pairwise priors between pairs of attributes

Evaluation ● Three test sets of 500 images each – A contains interesting objects – B contains most frequently occurring interesting objects – C contains interesting objects when multiple are present ● Baseline model – Incorporates only the priors, so no S or attributes ● Humans ~72% accuracy

Critique ● How important is the scene for the attributes? – S is only used for relative location {relation, object} attributes – Absolute location is most commonly used attribute – Over half of parsed descriptions only include object category ● Why don’t the authors include more information on the visual features? – Which visual features are most important? ● Better metric than precision and recall? – Just ask AMT workers if description is reasonable?

Critique ● Why don’t the authors analyze training referral expressions more? – Paid Turk workers per every 10 images – Some human expressions are just the object

Future Work ● Scale up the dataset and train end-to-end with the best neural networks ● Identify referred object instead of generating expression – Done in upcoming MAttNet paper ● Make the images and expressions more challenging

ReferItGame: Referring to Objects in Photographs of Natural Scenes - PowerPoint PPT Presentation

ReferItGame: Referring to Objects in Photographs of Natural Scenes Motivation First large-scale referring expression dataset Reference expressions are the natural way people talk Of psychological interest in the 70s; Grice,

MSS 153, LEMUEL A. GARRISON PAPERS SLIDES AND PHOTOGRAPHS SERIES DESCRIPTION AND CONTAINER LIST

build up a picture of life in the past Photographs Photographs play a very important role in

Slides and Photographs: List 28, South America (Classic Reprint) Slides and Photographs: List 28,

Slides and Photographs: List 34; India and Ceylon (Second Edition, June Slides and Photographs:

Slides and Photographs: List 18 (Classic Reprint) (Paperback) Slides and Photographs: List 18

Slides and Photographs: List, Issue 611 (Hardback) Slides and Photographs: List, Issue 611

3D from Photographs: Introduction Dr. Gianpaolo Palma gianpaolo.palam@isti.cnr.it 3D from

Mutable Values Announcements Objects (Demo) Objects 4 Objects Objects represent

61A Lecture 12 Announcements Objects (Demo) Objects 4 Objects Objects represent

The Generation of Referring Expressions: The Generation of Referring Expressions: Where We've

Learning(Distribu.ons(over(Logical(Forms(for( Referring(Expression(Genera.on(

Mat MattNet tNet: : Modu Modular Atten lar Attention tion Network for Referring Network for

photographs Roger Morris Hoverfly Recording Scheme www.hoverfly.co.uk With photographs by John

Objects & Inheritance Section 7 Implementing Objects in 401 Ways of implementing objects:

Live Objects Live Objects Live Objects Live Objects Krzys Ostrowski, Ken Birman, Danny Dolev

Learning Distribu.ons over Logical Forms for Referring Expression

Ethnic Differences in Referral Routes to Youth Mental Health Services CORC Regional Seminars,

We can get there from here. Transportation Funding Task Force February 27, 2019 As greater

Community resource referral platforms - Lessons from early health care adopters SIREN Webinar

WIC UPDATE WEBINAR October 1, 2020 Todays Agenda Welcome Terri Trisler Opening

Children and Youth Evaluation Service (C-YES): the State-designated Independent Entity (IE) for

Bringing Resources Together Using a Person Centered Approach } An IRT is initiated on behalf of an

What Kind of Benefits Are We Talking About? 2 Employee benefits under a benefit plan or statute,

Tree-based Methods Here we describe tree-based methods for regression and classification.