emnlp 2017 copenhagen contributions
play

EMNLP 2017 Copenhagen Contributions } Syntactic Graph Convolutional - PowerPoint PPT Presentation

Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling Diego Marcheggiani and Ivan Titov University of Amsterdam University of Edinburgh EMNLP 2017 Copenhagen Contributions } Syntactic Graph Convolutional


  1. Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling Diego Marcheggiani and Ivan Titov University of Amsterdam University of Edinburgh EMNLP 2017 Copenhagen

  2. Contributions } Syntactic Graph Convolutional Networks } State-of-the-art semantic role labeling model } English and Chinese Sequa makes and repairs jet engines.

  3. Semantic Role Labeling } Predicting the predicate-argument structure of a sentence Sequa makes and repairs jet engines. Sequa makes and repairs jet engines.

  4. Semantic Role Labeling } Predicting the predicate-argument structure of a sentence } Discover and disambiguate predicates repair.01 engine.01 make.01 Sequa makes and repairs jet engines.

  5. Semantic Role Labeling } Predicting the predicate-argument structure of a sentence } Discover and disambiguate predicates } Identify arguments and label them with their semantic roles A0 repair.01 engine.01 make.01 Sequa makes and repairs jet engines.

  6. Semantic Role Labeling } Predicting the predicate-argument structure of a sentence } Discover and disambiguate predicates } Identify arguments and label them with their semantic roles A1 A0 repair.01 engine.01 make.01 Sequa makes and repairs jet engines.

  7. Semantic Role Labeling } Predicting the predicate-argument structure of a sentence } Discover and disambiguate predicates } Identify arguments and label them with their semantic roles A1 A0 A1 A0 repair.01 engine.01 make.01 Sequa makes and repairs jet engines.

  8. Semantic Role Labeling } Predicting the predicate-argument structure of a sentence } Discover and disambiguate predicates } Identify arguments and label them with their semantic roles A1 A0 A1 A0 A1 repair.01 engine.01 make.01 Sequa makes and repairs jet engines.

  9. Semantic Role Labeling } Only the head of an argument is labeled } Sequence labeling task for each predicate } Focus on argument identification and labeling A1 A0 A1 A0 A1 repair.01 engine.01 make.01 Sequa makes and repairs jet engines.

  10. Related work } SRL systems that use syntax with simple NN architectures } [FitzGerald et al., 2015] } [Roth and Lapata, 2016] } Recent models ignore linguistic bias } [Zhou and Xu, 2014] } [He et al., 2017] } [Marcheggiani et al., 2017]

  11. Motivations creation creator Sequa makes and repairs jet engines. COORD CONJ NMOD SBJ OBJ ROOT } Some semantic dependencies are mirrored in the syntactic graph

  12. Motivations creation repairer entity repaired creator Sequa makes and repairs jet engines. COORD CONJ NMOD SBJ OBJ ROOT } Some semantic dependencies are mirrored in the syntactic graph } Not all of them – syntax-semantic interface is not trivial

  13. Encoding Sentences with Graph Convolutional Networks } Graph Convolutional Networks (GCNs) [Kipf and Welling, 2017] } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions

  14. Graph Convolutional Networks (message passing) [Kipf and Welling, 2017] Undirected graph

  15. Graph Convolutional Networks (message passing) [Kipf and Welling, 2017] Update of the blue node Undirected graph

  16. Graph Convolutional Networks (message passing) [Kipf and Welling, 2017] Update of the blue node Undirected graph 0 1 X h i = ReLU @ W 0 h i + W 1 h j A j ∈ N ( i ) Self loop Neighborhood

  17. GCNs Pipeline [Kipf and Welling, 2017] Hidden layer Hidden layer Input Output … … … X = H (0) Z = H (n) Initial feature Representation representation of H (1) H (2) informed by nodes’ nodes neighborhood

  18. GCNs Pipeline [Kipf and Welling, 2017] Hidden layer Hidden layer Input Output … … … X = H (0) Z = H (n) Initial feature Representation representation of H (1) H (2) informed by nodes’ nodes neighborhood Extend GCNs for syntactic dependency trees

  19. Encoding Sentences with Graph Convolutional Networks } Graph Convolutional Networks (GCNs) } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions

  20. Example Lane disputed those estimates NMOD OBJ SBJ

  21. Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) self × W (1) self × W (1) self × W (1) Lane disputed those estimates NMOD OBJ SBJ

  22. Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) self × W (1) self × W (1) self × W (1) × W (1) × W (1) W (1) obj subj nmod × Lane disputed those estimates NMOD OBJ SBJ

  23. Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) nmod 0 × W (1) self × W (1) self × W (1) self × W (1) × W (1) × W (1) subj 0 × W (1) W (1) obj subj nmod × × W (1) obj 0 Lane disputed those estimates NMOD OBJ SBJ

  24. Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) nmod 0 × W (1) self × W (1) self × W (1) self × W (1) × W (1) × W (1) subj 0 × W (1) W (1) obj subj nmod × × W (1) obj 0 Lane disputed those estimates NMOD OBJ SBJ

  25. Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) nmod 0 × W (1) self self × W (1) × W (1) self × W (1) × W (1) × W (1) subj 0 × × W (1) W (1) j s b o u nmod b j × W ( 1 ) o b j 0 Lane disputed those estimates NMOD OBJ SBJ

  26. Example ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (2) self × W (2) self nmod 0 × W (2) × W (2) self × W (2) × W (2) × W (2) subj 0 × W (2) ) 2 ( W j nmod b o s × u b j Stacking GCNs widens the × W ( 2 ) o b syntactic neighborhood j 0 ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) self × W (1) 0 d ) 1 o self ( self m × W (1) × W (1) n × W self × W (1) × W (1) × W (1) subj 0 × ) 1 ( W (1) W j s b o u × nmod b j × W ( 1 ) o b j 0 Lane disputed those estimates NMOD OBJ SBJ

  27. Syntactic GCNs 0 1 W ( k ) + b ( k ) @ X h ( k +1) L ( u,v ) h ( k ) = ReLU v u A L ( u,v ) u ∈ N ( v )

  28. Syntactic GCNs 0 1 W ( k ) + b ( k ) @ X h ( k +1) L ( u,v ) h ( k ) = ReLU v u A L ( u,v ) u ∈ N ( v ) Syntactic neighborhood

  29. Syntactic GCNs Message 0 1 W ( k ) + b ( k ) @ X h ( k +1) L ( u,v ) h ( k ) = ReLU v u A L ( u,v ) u ∈ N ( v ) Syntactic neighborhood

  30. Syntactic GCNs Message 0 1 W ( k ) + b ( k ) @ X h ( k +1) L ( u,v ) h ( k ) = ReLU v u A L ( u,v ) u ∈ N ( v ) Messages are direction and Self-loop is included in N Syntactic neighborhood label specific

  31. Syntactic GCNs Message 0 1 W ( k ) + b ( k ) @ X h ( k +1) L ( u,v ) h ( k ) = ReLU v u A L ( u,v ) u ∈ N ( v ) Messages are direction and Self-loop is included in N Syntactic neighborhood label specific } O verparametrized: one matrix for each label-direction pair W ( k ) L ( u,v ) = V ( k ) } dir ( u,v )

  32. Edge-wise Gates } Not all edges are equally important

  33. Edge-wise Gates } Not all edges are equally important } We should not blindly rely on predicted syntax

  34. Edge-wise Gates } Not all edges are equally important } We should not blindly rely on predicted syntax } Gates decide the “importance” of each message ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) g g g g g g g g g g Lane disputed those estimates NMOD SBJ OBJ

  35. Edge-wise Gates } Not all edges are equally important } We should not blindly rely on predicted syntax } Gates decide the “importance” of each message ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) ReLU ( Σ · ) g g g g g g g g g g Gates depend on Lane disputed those estimates nodes and edges NMOD SBJ OBJ

  36. Encoding Sentences with Graph Convolutional Networks } Graph Convolutional Networks (GCNs) } Syntactic GCNs } Semantic Role Labeling Model } Experiments } Conclusions

  37. Our Model } Word representation } Bidirectional LSTM encoder } GCN Encoder } Local role classifier

  38. Word Representation } Pretrained word embeddings } Word embeddings } POS tag embeddings } Predicate lemma embeddings } Predicate flag word representation Lane disputed those estimates

  39. BiLSTM Encoder } Encode each word with its left and right context } Stacked BiLSTM J layers BiLSTM word representation Lane disputed those estimates

  40. GCNs Encoder dobj nsubj nmod K layers } Syntactic GCNs after BiLSTM encoder GCN } Add syntactic information } Skip connections } Longer dependencies are captured J layers BiLSTM word representation Lane disputed those estimates

Recommend


More recommend