Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling Diego Marcheggiani and Ivan Titov University of Amsterdam University of Edinburgh EMNLP 2017 Copenhagen
Contributions } Syntactic Graph Convolutional Networks } State-of-the-art semantic role labeling model } English and Chinese Sequa makes and repairs jet engines.
Semantic Role Labeling } Predicting the predicate-argument structure of a sentence Sequa makes and repairs jet engines. Sequa makes and repairs jet engines.
Semantic Role Labeling } Predicting the predicate-argument structure of a sentence } Discover and disambiguate predicates repair.01 engine.01 make.01 Sequa makes and repairs jet engines.
Semantic Role Labeling } Predicting the predicate-argument structure of a sentence } Discover and disambiguate predicates } Identify arguments and label them with their semantic roles A0 repair.01 engine.01 make.01 Sequa makes and repairs jet engines.
Semantic Role Labeling } Predicting the predicate-argument structure of a sentence } Discover and disambiguate predicates } Identify arguments and label them with their semantic roles A1 A0 repair.01 engine.01 make.01 Sequa makes and repairs jet engines.
Semantic Role Labeling } Predicting the predicate-argument structure of a sentence } Discover and disambiguate predicates } Identify arguments and label them with their semantic roles A1 A0 A1 A0 repair.01 engine.01 make.01 Sequa makes and repairs jet engines.
Semantic Role Labeling } Predicting the predicate-argument structure of a sentence } Discover and disambiguate predicates } Identify arguments and label them with their semantic roles A1 A0 A1 A0 A1 repair.01 engine.01 make.01 Sequa makes and repairs jet engines.
Semantic Role Labeling } Only the head of an argument is labeled } Sequence labeling task for each predicate } Focus on argument identification and labeling A1 A0 A1 A0 A1 repair.01 engine.01 make.01 Sequa makes and repairs jet engines.
Related work } SRL systems that use syntax with simple NN architectures } [FitzGerald et al., 2015] } [Roth and Lapata, 2016] } Recent models ignore linguistic bias } [Zhou and Xu, 2014] } [He et al., 2017] } [Marcheggiani et al., 2017]
Motivations creation creator Sequa makes and repairs jet engines. COORD CONJ NMOD SBJ OBJ ROOT } Some semantic dependencies are mirrored in the syntactic graph
Motivations creation repairer entity repaired creator Sequa makes and repairs jet engines. COORD CONJ NMOD SBJ OBJ ROOT } Some semantic dependencies are mirrored in the syntactic graph } Not all of them – syntax-semantic interface is not trivial
Encoding Sentences with Graph Convolutional Networks } Graph Convolutional Networks (GCNs) [Kipf and Welling, 2017] } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions
Graph Convolutional Networks (message passing) [Kipf and Welling, 2017] Undirected graph
Graph Convolutional Networks (message passing) [Kipf and Welling, 2017] Update of the blue node Undirected graph
Graph Convolutional Networks (message passing) [Kipf and Welling, 2017] Update of the blue node Undirected graph 0 1 X h i = ReLU @ W 0 h i + W 1 h j A j ∈ N ( i ) Self loop Neighborhood
GCNs Pipeline [Kipf and Welling, 2017] Hidden layer Hidden layer Input Output … … … X = H (0) Z = H (n) Initial feature Representation representation of H (1) H (2) informed by nodes’ nodes neighborhood
GCNs Pipeline [Kipf and Welling, 2017] Hidden layer Hidden layer Input Output … … … X = H (0) Z = H (n) Initial feature Representation representation of H (1) H (2) informed by nodes’ nodes neighborhood Extend GCNs for syntactic dependency trees
Encoding Sentences with Graph Convolutional Networks } Graph Convolutional Networks (GCNs) } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions
Example Lane disputed those estimates NMOD OBJ SBJ
Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) self × W (1) self × W (1) self × W (1) Lane disputed those estimates NMOD OBJ SBJ
Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) self × W (1) self × W (1) self × W (1) × W (1) × W (1) W (1) obj subj nmod × Lane disputed those estimates NMOD OBJ SBJ
Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) nmod 0 × W (1) self × W (1) self × W (1) self × W (1) × W (1) × W (1) subj 0 × W (1) W (1) obj subj nmod × × W (1) obj 0 Lane disputed those estimates NMOD OBJ SBJ
Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) nmod 0 × W (1) self × W (1) self × W (1) self × W (1) × W (1) × W (1) subj 0 × W (1) W (1) obj subj nmod × × W (1) obj 0 Lane disputed those estimates NMOD OBJ SBJ
Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) nmod 0 × W (1) self self × W (1) × W (1) self × W (1) × W (1) × W (1) subj 0 × × W (1) W (1) j s b o u nmod b j × W ( 1 ) o b j 0 Lane disputed those estimates NMOD OBJ SBJ
Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (2) self × W (2) self nmod 0 × W (2) × W (2) self × W (2) × W (2) × W (2) subj 0 × W (2) ) 2 ( W j nmod b o s × u b j Stacking GCNs widens the × W ( 2 ) o b syntactic neighborhood j 0 ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) 0 d ) 1 o self ( self m × W (1) × W (1) n × W self × W (1) × W (1) × W (1) subj 0 × ) 1 ( W (1) W j s b o u × nmod b j × W ( 1 ) o b j 0 Lane disputed those estimates NMOD OBJ SBJ
Syntactic GCNs 0 1 W ( k ) + b ( k ) @ X h ( k +1) L ( u,v ) h ( k ) = ReLU v u A L ( u,v ) u ∈ N ( v )
Syntactic GCNs 0 1 W ( k ) + b ( k ) @ X h ( k +1) L ( u,v ) h ( k ) = ReLU v u A L ( u,v ) u ∈ N ( v ) Syntactic neighborhood
Syntactic GCNs Message 0 1 W ( k ) + b ( k ) @ X h ( k +1) L ( u,v ) h ( k ) = ReLU v u A L ( u,v ) u ∈ N ( v ) Syntactic neighborhood
Syntactic GCNs Message 0 1 W ( k ) + b ( k ) @ X h ( k +1) L ( u,v ) h ( k ) = ReLU v u A L ( u,v ) u ∈ N ( v ) Messages are direction and Self-loop is included in N Syntactic neighborhood label specific
Syntactic GCNs Message 0 1 W ( k ) + b ( k ) @ X h ( k +1) L ( u,v ) h ( k ) = ReLU v u A L ( u,v ) u ∈ N ( v ) Messages are direction and Self-loop is included in N Syntactic neighborhood label specific } O verparametrized: one matrix for each label-direction pair W ( k ) L ( u,v ) = V ( k ) } dir ( u,v )
Edge-wise Gates } Not all edges are equally important
Edge-wise Gates } Not all edges are equally important } We should not blindly rely on predicted syntax
Edge-wise Gates } Not all edges are equally important } We should not blindly rely on predicted syntax } Gates decide the “importance” of each message ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) g g g g g g g g g g Lane disputed those estimates NMOD SBJ OBJ
Edge-wise Gates } Not all edges are equally important } We should not blindly rely on predicted syntax } Gates decide the “importance” of each message ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) g g g g g g g g g g Gates depend on Lane disputed those estimates nodes and edges NMOD SBJ OBJ
Encoding Sentences with Graph Convolutional Networks } Graph Convolutional Networks (GCNs) } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions
Our Model } Word representation } Bidirectional LSTM encoder } GCN Encoder } Local role classifier
Word Representation } Pretrained word embeddings } Word embeddings } POS tag embeddings } Predicate lemma embeddings } Predicate flag word representation Lane disputed those estimates
BiLSTM Encoder } Encode each word with its left and right context } Stacked BiLSTM J layers BiLSTM word representation Lane disputed those estimates
GCNs Encoder dobj nsubj nmod K layers } Syntactic GCNs after BiLSTM encoder GCN } Add syntactic information } Skip connections } Longer dependencies are captured J layers BiLSTM word representation Lane disputed those estimates
Recommend
More recommend