Reasoning about Fine-grained Attribute Phrases using Reference Games - PowerPoint PPT Presentation

Reasoning about Fine-grained Attribute Phrases using Reference Games Jong-Chyi Su* Chenyun Wu* Huaizu Jiang Subhransu Maji   University of Massachusetts, Amherst ICCV 2017

Expert-designed Attributes Is military plane? No Is propellor plane? No ✔ Modular - an instance can be described by a set of attributes ✘ A fixed set of attributes designed by experts before collecting the dataset (49 attributes from OID-Aircraft [1] ) [1] Vedaldi et al., Understanding Objects in Detail with Fine-grained Attributes, CVPR , 2014. 2

Image Captions A large Air France jet sitting on top of a runway. � Usually a longer sentence describing many aspects ✔ Compositional language-based ✘ Not designed to describe di ff erences between a pair of images 3

Image Captions A large airplane on a runway. A large Air France jet sitting on top of a runway. � Usually a longer sentence describing many aspects ✔ Compositional language-based ✘ Not designed to describe di ff erences between a pair of images 4

New Dataset - “Attribute Phrases” • Short phrases describing visual di ff erences within a pair of images sampled from di ff erent categories • 9400 image pairs in total Facing right vs. Facing left vs. Jet engine Propeller In the air vs. On the ground vs. Two-tone gray body Red and white body Closed cockpit vs. Open cockpit vs. Pointed nose Flat nose White and green vs. White and blue color vs. Grounded In flight Propeller spinning vs. Propeller stopped vs. No pilot visible Pilot visible ✔ Modular like attributes ✔ Compositional and free-form like image captions ✔ More expressive and discriminative at fine-grained level 5

Attribute Phrases • How to generate? “Blue plane vs. Red plane” • How to evaluate? “Red plane” • Use reference game 6

Reference Game • Refer It Game [1] • RefCOCO [2] Generation Comprehension • Refer to a specific object in an image • Usually focus on the category, spatial relationship etc. • Our task focuses on attributes that enable fine-grained discrimination with instances of a category [1] Kazemzadeh et al. "ReferItGame: Referring to Objects in Photographs of Natural Scenes”, EMNLP, 2014. [2] Yu et al. "Modeling Context in Referring Expressions”, ECCV, 2016. 7

Overview of Our Model • Generation task - speaker model • Comprehension task - listener model “Red plane” Speaker Listener 1. Train the speaker and listener model separately 2. Use the listener model to evaluate the speaker model 3. Rerank phrases by the listener, then evaluate by human 8

Use Listener Model for Comprehension Task “Red plane” Listener • Task : Given an attribute phrase and two images, find which image it is referring to • Method : Measure the similarity between the attribute phrase and images in a common embedded space 9

Use Speaker Model for Generation Task Speaker “Red plane” • Task : Given two images, generate discriminative attributes • Method : Use the image captioning model [1] as the speaker model [1] Vinyals et al., Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, TPAMI , 2016. 10

Variances of the Speaker Model Speaker Listener “Red plane” Red Red DS SS vs. plane Blue - Simple Speaker (SS) : Given one image, generate one phrase - Discerning Speaker (DS) : Given two images, generate a pair of phrases Speaker Top Accuracy (%) • Use the listener model to 1 81.7 evaluate the quality of the SS 5 80.6 generated phrases 10 80.0 ~10% • DS generates better attribute 1 92.8 phrases than SS DS 5 91.4 10 90.5 11

Discerning Speaker Generate Better Phrases Ground Truth: (Human generated) 1) small size VS large size 2) single seat VS more seated 3) facing left VS facing right 4) private VS commercial 5) wings at the top VS wings at the bottom DS: SS: 1) private plane VS commercial plane 1) no engine 2) private VS commercial 2) small 3) small plane VS large plane 3) private plane 4) facing left VS facing right 4) on the ground 5) short VS long 5) propellor engine 6) white VS red 6) on ground 7) high wing VS low wing 7) glider 8) small VS large 8) white color 9) glider VS jetliner 9) small plane 10) white and blue color VS 10) no propeller Some phrases are correct white red and blue color but not discriminative 12

Pragmatic Speaker Helps � Red plane � Red plane � Glider � Propellor engine Re-rank by ? Facing left ? Facing left Speaker Listener � Propellor engine � Glider … … 1. Use speaker to generate attribute phrases 2. Re-rank the phrases by the scores from the listener model � More discriminative phrases on the top SS + Re-ranking: SS: DS: DS + Re-ranking: ✔ commercial plane ✔ passenger plane ✔ commercial plane ✔ commercial plane ✔ large ✔ jet engine ? white ? facing right ✔ large size ✔ jet engine ✔ turbofan engine ✔ turbofan engine ✔ jet engine ✔ on concrete ✔ twin engine ? facing right ✔ on runway ✔ commercial plane ✔ t tail ✔ on concrete ✔ passenger plane ✘ _UNK ✔ jet engine ✔ multi seater ? on the ground ✔ twin engine ✔ t tail ? on the ground ✘ _UNK ✔ large ✔ multi seater ✔ white and red ✔ large size ? white ✔ white and red ? facing right ✔ white colour with red stripes ✔ on runway ? facing right ✔ white colour with red stripes [1] Andreas et al., “Reasoning About Pragmatics with Neural Listeners and Speakers”, EMNLP , 2016 13

Pragmatic Speaker Helps • Use human listener for evaluation: • Given a attribute phrase, let users choose the image among two Original A7er Re-ranking Speaker Top Acc. (%) Acc. (%) 1 82.0 95.0 Discerning 5 80.2 90.0 Speaker 7 79.1 86.7 Re-ranking improves ~10% on top-5 accuracy 14

Are Attribute Phrases Better than Expert-designed Attributes? • Use attribute as the feature for fine-grained classification task • Use our listener model to get the scores between the image and the top-k most frequent attribute phrases • Use expert-designed 46 attributes from OID dataset • Test on FGVC-Aircraft dataset [1] (100 classes) • ~20% improvement Attribute phrases ~24% ~32% OID attributes ~12% 15 [1] Maji et al., Fine-grained Visual Classification of Aircraft, arXiv:1306.5151 , 2013.

Generate Attribute for Sets • Select two categories (A,B), generate attribute phrases for randomly selected image pairs (Im 1 ∈ A, Im 2 ∈ B) • Sort them by frequency 747-400 ATR-42 large plane private plane more windows less windows commercial plane medium plane more windows on body propellor engine big plane fewer windows on body commercial small plane jet engine private turbofan engine propeller engine engines under wings stabilizer on top of tail on ground british airways 16

Use the Listener Model for Image Retrieval • Query : attribute phrase(s) • Get scores of the query phrase and test images by the listener model • We show top 18 images ranked by the scores 17

t-SNE Embeddings of Attribute Phrases from the Listener Model Large commercial planes Military planes 18

Thank you! Dataset and Code are available at: 19

Reasoning about Fine-grained Attribute Phrases using Reference Games - PowerPoint PPT Presentation

Reasoning about Fine-grained Attribute Phrases using Reference Games Jong-Chyi Su* Chenyun Wu* Huaizu Jiang Subhransu Maji University of Massachusetts, Amherst ICCV 2017 Expert-designed Attributes Is military plane? No Is propellor

Identification of Fine Grained Feature Based Event and Sentiment Phrases from Business News

Why attribute-based signatures? The kind of authentication required in an attribute-based system

Fine-Grained Geographic Communication (Geocast) Nexus Workshop Frank Drr 23.07.2003 1

Fi Fine ne-gr grained ained Vid Video eo-Te Text Re Retrieval wi with th Hier Hierar

Fi Fine ne-gr grained ained Vid Video eo-Te Text Re Retrieval wi with th Hier Hierar

Mechanized Verification of Fine-grained Concurrent Programs Ilya Sergey Aleks Nanevski

Parts of Speech More Fine-Grained Classes More

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

Fine-grained Visual Analysis: From Classification to Retrieval Yi-Zhe Song SketchX Lab, CVSSP,

Communicating State Transition Systems for Fine-Grained Concurrent Resources Aleksandar Nanevski

Fine-grained Image Recognition Lei Wang VILA group School of Computing and Information

TRILL Fine Grained Labeling Donald Eastlake 3 rd Huawei

Addressing Inter-Class Similarity in Fine-Grained Visual Classification Abhimanyu Dubey

Fine-Grained Power Modeling for Smartphones Using System Call Tracing Based on paper and

On the Correctness Criteria of Fine-Grained Access Control in Relational Databases Qihua Wang,

Fine-Grained Access Control Fine Grained Access Control Fine-grained access control examples:

Assessing Interpretable, Attribute-related Meaning Representations for Adjective-Noun Phrases in

Fine Grained Coordinated Parallelism in a Real World Application Mohammad Rezaei, PhD June 2012

Mesos A Platform for Fine-Grained Resource Sharing in the

Enhancing Fine- Grained Parallelism Loop vectorization, Loop distribution, Scalar expansion

Fine-Grained Tracking of Grid Infections Ashish Gehani SRI Basim Baig, Salman Mahmood, Dawood

Owen S. Hofmann, Xuan Wang, Emmett Witchel, Donald E. Porter 1 Fine-grained locking -

A Structured Vector Space Model for Hidden Attribute Meaning in Adjective-Noun Phrases Matthias

Part-based R-CNNs for Fine-grained Category Detec7on