feature rich compositional
play

Feature-Rich Compositional Embedding Models Mo Yu * Matt Gormley * - PowerPoint PPT Presentation

Improved Relation Extraction with Feature-Rich Compositional Embedding Models Mo Yu * Matt Gormley * Mark Dredze September 21, 2015 EMNLP *Co-first authors 1 FCM or: How I Learned to Stop Worrying (about Deep Learning) and Love Features Mo


  1. Improved Relation Extraction with Feature-Rich Compositional Embedding Models Mo Yu * Matt Gormley * Mark Dredze September 21, 2015 EMNLP *Co-first authors 1

  2. FCM or: How I Learned to Stop Worrying (about Deep Learning) and Love Features Mo Yu * Matt Gormley * Mark Dredze September 21, 2015 EMNLP *Co-first authors 2

  3. Handcrafted Features born-in LOC PER S p(y|x) ∝ exp( Θ y  f ( ) ) NP VP ADJP NP VP NNP : VBN NNP VBD egypt - born proyas direct Egypt - born Proyas directed 3

  4. Where do features come from? First word before M1 Second word before M1 hand-crafted Feature Engineering Bag-of-words in M1 features Head word of M1 Other word in between First word after M2 Sun et al., 2011 Second word after M2 Bag-of-words in M2 Head word of M2 Bigrams in between Words on dependency path Country name list Personal relative triggers Personal title list Zhou et al., WordNet Tags 2005 Heads of chunks in between Path of phrase labels Combination of entity types Feature Learning 4

  5. Where do features come from? Look-up table Classifier embeddin input missing word (context words) g hand-crafted Feature Engineering features unsupervised learning Sun et al., 2011 cat: 0.11 .23 … -.45 similar words, similar embeddings dog: 0.13 .26 … -.52 CBOW model in Mikolov et al. (2013) Zhou et al., 2005 word embeddings Mikolov et al., 2013 Feature Learning 5

  6. Where do features come from? pooling hand-crafted Feature Engineering features The [movie] showed [wars] Sun et al., 2011 The [movie] showed [wars] Recursive Auto Encoder Convolutional Neural Networks (Socher 2011) (Collobert and Weston 2008) CNN RAE Zhou et al., string 2005 word embeddings embeddings Socher, 2011 Mikolov et al., Collobert & Weston, 2013 2008 Feature Learning 6

  7. Where do features come from? hand-crafted S Feature Engineering features W NP,VP NP VP Sun et al., 2011 W DT,NN W V,NN tree embeddings Socher et al., 2013 Hermann & Blunsom, 2013 The [movie] showed [wars] Zhou et al., string 2005 word embeddings embeddings Socher, 2011 Mikolov et al., Collobert & Weston, 2013 2008 Feature Learning 7

  8. Where do features come from? word embedding hand-crafted features Feature Engineering features Turian et al. 2010 Hermann et al. Koo et al. Sun et al., 2011 2014 2008 tree embeddings Socher et al., 2013 Hermann & Blunsom, 2013 Zhou et al., string 2005 word embeddings embeddings Socher, 2011 Mikolov et al., Collobert & Weston, 2013 2008 Feature Learning 8

  9. Where do features come from? word embedding Our model hand-crafted features (FCM) Feature Engineering features Turian et al. 2010 Hermann et al. Koo et al. Sun et al., 2011 2014 2008 tree embeddings Socher et al., 2013 Hermann & Blunsom, 2013 Zhou et al., string 2005 word embeddings embeddings Socher, 2011 Mikolov et al., Collobert & Weston, 2013 2008 Feature Learning 9

  10. Feature-rich Compositional Embedding Model (FCM) Goals for our Model: 1. Incorporate semantic/syntactic structural information 2. Incorporate word meaning 3. Bridge the gap between feature engineering and feature learning – but remain as simple as possible 10

  11. Feature-rich Compositional Embedding Model (FCM) Per-word Features: f 1 f 2 f 3 f 4 f 5 f 6 0 1 1 0 0 1 on-path(w i ) 0 0 1 1 1 0 is-between(w i ) 0 1 0 0 0 0 head-of-M1(w i ) 0 0 0 0 0 1 head-of-M2(w i ) 1 0 0 0 0 before-M1(w i ) 0 0 0 0 1 0 before-M2(w i ) 0 … … … … … … … noun- verb- noun- noun- verb- nil other percep. other person comm. The [movie] M1 I watched depicted [hope] M2 11

  12. Feature-rich Compositional Embedding Model (FCM) Per-word Features: f 5 1 on-path(w i ) 1 is-between(w i ) 0 head-of-M1(w i ) 0 head-of-M2(w i ) 0 before-M1(w i ) 1 before-M2(w i ) … … noun- verb- noun- noun- verb- nil other percep. other person comm. The [movie] M1 I watched depicted [hope] M2 12

  13. Feature-rich Compositional Embedding Model (FCM) Per-word Features: (with conjunction) f 5 1 on-path(w i ) & w i = “depicted” 1 is-between(w i ) & w i = “depicted” 0 head-of-M1(w i ) & w i = “depicted” 0 head-of-M2(w i ) & w i = “depicted” 0 before-M1(w i ) & w i = “depicted” 1 before-M2(w i ) & w i = “depicted” … … noun- verb- noun- noun- verb- nil other percep. other person comm. The [movie] M1 I watched depicted [hope] M2 13

  14. Feature-rich Compositional Embedding Model (FCM) Per-word Features: (with soft conjunction) f 5 1 on-path(w i ) Outer-product is-between(w i ) 1 head-of-M1(w i ) 0 head-of-M2(w i ) 0 before-M1(w i ) 0 before-M2(w i ) 1 … … -.3 .9 .1 -1 e depicted noun- verb- noun- noun- verb- nil other percep. other person comm. The [movie] M1 I watched depicted [hope] M2 14

  15. Feature-rich Compositional Embedding Model (FCM) Per-word Features: (with soft conjunction) f 5 1 on-path(w i ) -.3 .9 .1 -1 is-between(w i ) 1 -.3 .9 .1 -1 head-of-M1(w i ) 0 -.3 .9 .1 -1 head-of-M2(w i ) 0 0 0 0 0 before-M1(w i ) 0 0 0 0 0 before-M2(w i ) 1 -.3 .9 .1 -1 … … … … … … -.3 .9 .1 -1 e depicted noun- verb- noun- noun- verb- nil other percep. other person comm. The [movie] M1 I watched depicted [hope] M2 15

  16. Feature-rich Compositional Embedding Model (FCM) T y f i Σ n p(y|x) ∝ exp i=1 And finally, exponentiates and renormalizes Then takes the dot- e w i product with a parameter tensor Our full model sums over each word in the sentence 16

  17. Features for FCM • Let M1 and M2 denote the left and right entity mentions • Our per-word Binary Features:  head of M1  head of M2  in-between M1 and M2  -2, -1, +1, or +2 of M1  -2, -1, +1, or +2 of M2  on dependency path between M1 and M2 • Optionally: Add the entity type of M1, M2, or both 17

  18. FCM as a Neural Network • Embeddings are (optionally) treated as model parameters • A log-bilinear model • We initialize, then fine-tune the embeddings p(y|x)� ฀฀฀ � Parameter Τ � tensor ฀฀฀ � e x� ฀฀฀ � Σ � h 1� ฀฀฀ � h n� ฀฀฀ � ฀฀฀ � � f 1� f n� e � e � w n� w 1� Binary Embeddings features 18

  19. Baseline Model born-in� LOC� PER� p(y|x) µ S exp( Θ y ฀ f ( ) NP VP ) ADJP� NP� VP� NNP� :� VBN� NNP� VBD� egypt� -� born� proyas� direct� Egypt� -� born� Proyas�directed� Y i,j • Multinomial logistic regression ( standard approach ) • Bring in all the usual binary NLP features (Sun et al., 2011) – type of the left entity mention – dependency path between mentions – bag of words in right mention – … 19

  20. Hybrid Model: Baseline + FCM born-in� LOC� PER� p(y|x) µ S exp( Θ y ฀ f ( ) NP VP ) ADJP� NP� VP� NNP� :� VBN� NNP� VBD� egypt� -� born� proyas� direct� Egypt� -� born� Proyas�directed� Y i,j p(y|x)� ฀฀฀ � Τ � ฀฀฀ � e x� ฀฀฀ � Σ � h 1� ฀฀฀ � ฀฀฀ � h n� ฀฀฀ � � f 1� f n� e � e � w n� w 1� Product of Experts: 1 p(y|x) = p Baseline (y|x) p FCM (y|x) Z(x) 20

  21. Experimental Setup ACE 2005 SemEval-2010 Task 8 • • Data: 6 domains Data: Web text – Newswire (nw) – Newswire (nw) – Broadcast Conversation (bc) – Broadcast Conversation (bc) – Broadcast News (bn) – Broadcast News (bn) – Telephone Speech (cts) – Telephone Speech (cts) – Usenet Newsgroups (un) – Usenet Newsgroups (un) – Weblogs (wl) – Weblogs (wl) • • Train: bn+nw ( ~3600 relations ) Train: Standard split Dev: ½ of bc Dev: from shared task Test: ½ of bc, cts, wl` Test: • • Metric: Micro F1 Metric: Macro F1 (given entity mention) (given entity boundaries) 21

  22. ACE 2005 Results 65% Baseline FCM Baseline+FCM 60% Micro F1 55% 50% 45% Broadcast Conversation Conversational Telephone Weblogs Speech Test Set 22

  23. SemEval-2010 Results Source Classifier F1 Socher et al. (2012) RNN 74.8 MVRNN 79.1 Socher et al. (2012) Hashimoto et al. (2015) RelEmb 81.8 SVM 82.2 Rink and Harabagiu (2010) Best in SemEval-2010 Shared Task Zeng et al. (2014) CNN 82.7 CR-CNN (log-loss) 82.7 Santos et al. (2015) DepNN 82.8 Liu et al. (2015) Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8 70 72 74 76 78 80 82 84 86

  24. SemEval-2010 Results Source Classifier F1 Socher et al. (2012) RNN 74.8 MVRNN 79.1 Socher et al. (2012) Hashimoto et al. (2015) RelEmb 81.8 SVM 82.2 Rink and Harabagiu (2010) Best in SemEval-2010 Shared Task Zeng et al. (2014) CNN 82.7 CR-CNN (log-loss) 82.7 Santos et al. (2015) DepNN 82.8 Liu et al. (2015) Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8 FCM (log-linear) 81.4 FCM (log-bilinear) 83.0 70 72 74 76 78 80 82 84 86

  25. SemEval-2010 Results Source Classifier F1 Socher et al. (2012) RNN 74.8 MVRNN 79.1 Socher et al. (2012) Hashimoto et al. (2015) RelEmb 81.8 SVM 82.2 Rink and Harabagiu (2010) Zeng et al. (2014) CNN 82.7 CR-CNN (log-loss) 82.7 Santos et al. (2015) DepNN 82.8 Liu et al. (2015) Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8 FCM (log-linear) 81.4 FCM (log-bilinear) 83.0 70 72 74 76 78 80 82 84 86

Recommend


More recommend