Scene Graph Parsing as Dependency Parsing Author: Yu-Siang Wang , Chenxi Liu, Xiaohui Zeng, Alan Yuille Conference: North American Chapter of the Association for Computational Linguistics, 2018 1
Outline Introduction ● Method ● Experiments ● Conclusion ● 2
Introduction Introduction ● Method ● Experiments ● Conclusion ● 3
Introduction Many multimodal tasks fit into this picture ● A young boy wearing Intermediate black shirt is in front Representation of a goal 4
Image Generation from Text A young boy wearing Intermediate black shirt is in front Representation of a goal 5
Image Captioning A young boy wearing Intermediate black shirt is in front Representation of a goal 6
Image Retrieval A young boy wearing Intermediate black shirt is in front Representation of a goal 7
Neural Network Embedding Neural network embeddings often used as the intermediate representation ● Pro: easy training; similarity with cosine distance ● Con: no explicit structure; no easy interpretability ● A young boy wearing 1.2, -1.3, 4.6, …, -3.7 black shirt is in front 2.3, -2.2, -2.6,…, 5.3 of a goal 3.8, -7.4,-5.9 …, -3.2 8
Scene Graph More recently, people start exploring a more explainable representation ● Has 3 types of nodes: object, attribute, relation ● A young boy wearing black shirt is in front of a goal 9 Ref: Johnson et al., Image Retrieval Using Scene Graph, CVPR 2015
Our Goal Parsing from sentence to scene graph (i.e., scene graph parsing) ● A young boy wearing black shirt is in front of a goal 10
Previous Work: Separated Two-stage Standard Heuristic rules; Dependency Simple classifier Parsing a young boy wearing Ref: Anderson et al., SPICE: Semantic Propositional Image Caption Evaluation, ECCV 2016 black shirt is in front of a man 11
Our Work: End-to-end One-stage a young boy wearing Ref: Anderson et al., SPICE: Semantic Propositional Image Caption Evaluation, ECCV 2016 black shirt is Customized in front of a Dependency man Equivalent Parsing 12
Method Introduction ● Method ● Experiments ● Conclusion ● 13
Scene Graph Node-centric View 14
Pushing Labels from Node to Arc Node-centric View Edge-centric View Object node to attribute node Object node to relation node Relation node to object node Equivalent Different colors are different arc labels ● Under the edge-centric view, scene graphs begin to look like dependency parses ● 15
Review of Dependency Parsing 3. Pick a System (e.g. 2. Define a Label Space! 1. Get a Corpus! Arc-Hybrid) and its Actions! NSUBJ LEFT NMOD RIGHT CASE SHIFT DET ... ... 16
How we do Scene Graph Parsing? 3. Pick a System (e.g. 2. Define a Label Space! 1. Get a Corpus! Arc-Hybrid) and its Actions! ? ? ? 17
How we do Scene Graph Parsing? 3. Pick a System (e.g. 2. Define a Label Space! 1. Get a Corpus! Arc-Hybrid) and its Actions! ? ? ? 18
Visual Genome In Visual Genome, every image is annotated with 30 regions on average ● Every region is annotated with a (region) description and a (region) scene graph ● kid sit on ground A young boy wearing black A kid is sitting on the ground shirt is in front of a goal 19 Ref: Krishna et al., Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations, IJCV 2017
Alignment Strategy To mimic a dependency parsing training corpus, we need alignment between ● nodes in the scene graph and words in the sentence We propose a two-round alignment strategy: ● Within each round, object, attribute, relation nodes are aligned in this order ○ First round is more “conservative” (word-by-word match) ○ Second round is more “aggressive” (synonyms match) ○ 20
Alignments made in Round 1 a young boy wearing black shirt is in front of a goal ROOT 21 21
Alignments made in Round 2 a young boy wearing black shirt is in front of a goal ROOT 22 22
Alignment Result a young boy wearing black shirt is in front of a goal ROOT 23
How we do Scene Graph Parsing? 3. Pick a System (e.g. 2. Define a Label Space! 1. Get a Corpus! Arc-Hybrid) and its Actions! ? ? 24
Regular Labels 1. ATTR 2. SUBJ 3. OBJT Object to Attribute Object to Relation Relation to Object SUBJ OBJT SUBJ ATTR OBJT ATTR a young boy wearing black shirt is in front of a goal ROOT 25
Auxiliary Labels 1. ATTR 2. SUBJ 3. OBJT 4. CONT 5. BEGN Object to Attribute Object to Relation Relation to Object Phrase ROOT to Obj without Head BEGN SUBJ OBJT SUBJ ATTR OBJT CONT CONT ATTR a young boy wearing black shirt is in front of a goal ROOT 26
How we do Scene Graph Parsing? 3. Pick a System (e.g. 2. Define a Label Space! 1. Get a Corpus! Arc-Hybrid) and its Actions! ? BEGN SUBJ OBJT CONT ATTR 27
Transition-Based Arc-Hybrid System Ref: Kuhlmann et al., Dynamic programming algorithms for transition-based dependency parsers, ACL 2011 28
Transition-Based Arc-Hybrid System Ref: Kuhlmann et al., Dynamic programming algorithms for transition-based dependency parsers, ACL 2011 29
Augmented Arc-Hybrid We augment Arc-Hybrid with one more action that is REDUCE ● This is because we don’t require every word to have a head (e.g. “is”) ● 30
How we do Scene Graph Parsing? 3. Define Actions in a System 2. Define a Label Space! 1. Get a Corpus! (e.g. Arc-Hybrid)! BEGN LEFT SUBJ RIGHT OBJT SHIFT CONT REDUCE ATTR 31
Detailed Architecture 1.Initialization Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 2. Predict the next action to take 32 Ref: Kiperwasser and Goldberg, Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, TACL 2016
Detailed Architecture Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT a 1 young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 2 fully connected layers BiLSTM a young boy wearing black shirt is in front of a goal ROOT 33 Ref: Kiperwasser and Goldberg, Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations, TACL 2016
Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT a young boy wearing black shirt is in front of a goal ROOT 34
Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE a young boy wearing black shirt is in front of a goal ROOT 35
Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT a young boy wearing black shirt is in front of a goal ROOT 36
Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) ATTR a young boy wearing black shirt is in front of a goal ROOT 37
Step Stack Buffer Action 0 a young boy wearing black shirt is in front of a goal ROOT SHIFT 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) 4 boy wearing black shirt is in front of a goal ROOT SHIFT ATTR a young boy wearing black shirt is in front of a goal ROOT 38
Step Stack Buffer Action 1 a young boy wearing black shirt is in front of a goal ROOT REDUCE 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 5 boy wearing black shirt is in front of a goal ROOT SHIFT ATTR a young boy wearing black shirt is in front of a goal ROOT 39
Step Stack Buffer Action 2 young boy wearing black shirt is in front of a goal ROOT SHIFT 3 young boy wearing black shirt is in front of a goal ROOT LEFT( ATTR ) 4 boy wearing black shirt is in front of a goal ROOT SHIFT 5 boy wearing black shirt is in front of a goal ROOT SHIFT 6 boy wearing black shirt is in front of a goal ROOT SHIFT ATTR a young boy wearing black shirt is in front of a goal ROOT 40
Recommend
More recommend