moving down the long tail of word sense disambiguation
play

Moving Down the Long Tail of Word Sense Disambiguation with Gloss - PowerPoint PPT Presentation

Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders Terra Blevins and Luke Zettlemoyer The plant sprouted a new leaf. (n) (botany) a (n) buildings for (v) to put or set living organism... carrying on (a


  1. Moving Down the Long Tail of Word Sense Disambiguation with Gloss Informed Bi-encoders Terra Blevins and Luke Zettlemoyer

  2. The plant sprouted a new leaf. (n) (botany) a (n) buildings for (v) to put or set living organism... carrying on (a seed or plant) industrial labor into the ground

  3. The plant sprouted a new leaf. (n) (botany) a (n) buildings for (v) to put or set living organism... carrying on (a seed or plant) industrial labor into the ground

  4. Target Word Context The plant sprouted a new leaf. (n) (botany) a (n) buildings for (v) to put or set living organism... carrying on (a seed or plant) industrial labor into the ground Candidate Senses

  5. Data Sparsity in WSD ● Senses have Zipfian distribution in natural language Kilgarriff (2004), How dominant is the commonest sense of a word? . Kumar et al. (2019), Zero-shot Word Sense Disambiguation using Sense Definition Embeddings.

  6. Data Sparsity in WSD EWISE ● Senses have Zipfian distribution in natural language ● Data imbalance leads to worse performance on uncommon senses Kilgarriff (2004), How dominant is the commonest sense of a word? . Kumar et al. (2019), Zero-shot Word Sense Disambiguation using Sense Definition Embeddings.

  7. Data Sparsity in WSD EWISE ● Senses have Zipfian distribution in natural language ● Data imbalance leads to worse 62.3 F1 performance on uncommon senses point gap Kilgarriff (2004), How dominant is the commonest sense of a word? . Kumar et al. (2019), Zero-shot Word Sense Disambiguation using Sense Definition Embeddings.

  8. Data Sparsity in WSD EWISE ● Senses have Zipfian distribution in natural language ● Data imbalance leads to worse 62.3 F1 performance on uncommon senses point gap ● We propose an approach to improve performance on rare senses with pretrained models and glosses Kilgarriff (2004), How dominant is the commonest sense of a word? . Kumar et al. (2019), Zero-shot Word Sense Disambiguation using Sense Definition Embeddings.

  9. Incorporating Glosses into WSD Models ● Lexical overlap between context and gloss is a successful knowledge -based approach (Lesk, 1986)

  10. Incorporating Glosses into WSD Models ● Lexical overlap between context and gloss is a successful knowledge -based approach (Lesk, 1986) ● Neural models integrate glosses by: ○ Adding glosses as additional inputs into the WSD model (Luo et al., 2018a,b)

  11. Incorporating Glosses into WSD Models ● Lexical overlap between context and gloss is a successful knowledge -based approach (Lesk, 1986) ● Neural models integrate glosses by: ○ Adding glosses as additional inputs into the WSD model (Luo et al., 2018a,b) Mapping encoded gloss ○ representations onto graph embeddings to be used as labels for a WSD model (Kumar et al., 2019)

  12. Pretrained Models for WSD ● Simple probing classifiers on frozen pretrained representations found to perform better than models without pretraining Hadiwinoto et al. (2019), Improved word sense disambiguation using pretrained contextualized representations. Huang et al. (2019), GlossBERT: Bert for word sense disambiguation with gloss knowledge.

  13. Pretrained Models for WSD ● Simple probing classifiers on frozen pretrained representations found to perform better than models without pretraining ● GlossBERT finetunes BERT on WSD with glosses by setting it up as a sentence-pair classification task Hadiwinoto et al. (2019), Improved word sense disambiguation using pretrained contextualized representations. Huang et al. (2019), GlossBERT: Bert for word sense disambiguation with gloss knowledge.

  14. Our Approach: Gloss Informed Bi-encoder ● Two encoders that independently encode the context and gloss , aligning the target word embedding to the correct sense embedding

  15. Our Approach: Gloss Informed Bi-encoder ● Two encoders that independently encode the context and gloss , aligning the target word embedding to the correct sense embedding ● Encoders initialized with BERT and trained end-to-end, without external knowledge

  16. Our Approach: Gloss Informed Bi-encoder ● Two encoders that independently encode the context and gloss , aligning the target word embedding to the correct sense embedding ● Encoders initialized with BERT and trained end-to-end, without external knowledge ● The bi-encoder is more computationally efficient than a cross-encoder

  17. Our Approach: Gloss Informed Bi-encoder

  18. Our Approach: Gloss Informed Bi-encoder

  19. Our Approach: Gloss Informed Bi-encoder

  20. Our Approach: Gloss Informed Bi-encoder

  21. Baselines and Prior Work Model Glosses? Pretraining? Source HCAN Luo et al., 2018a ✓ EWISE Kumar et al., 2019 ✓ BERT Probe ✓ Ours GLU ✓ Hadiwinoto et al., 2019 LMMS Loureiro and Jorge, 2019 ✓ ✓ SVC Vial et al., 2019 ✓ GlossBERT Huang et al., 2019 ✓ ✓ Bi-encoder Model ( BEM ) Ours ✓ ✓

  22. Baselines and Prior Work Model Glosses? Pretraining? Source HCAN Luo et al., 2018a ✓ EWISE Kumar et al., 2019 ✓ BERT Probe ✓ Ours GLU ✓ Hadiwinoto et al., 2019 LMMS Loureiro and Jorge, 2019 ✓ ✓ SVC Vial et al., 2019 ✓ GlossBERT Huang et al., 2019 ✓ ✓ Bi-encoder Model ( BEM ) Ours ✓ ✓

  23. Baselines and Prior Work Model Glosses? Pretraining? Source HCAN Luo et al., 2018a ✓ EWISE Kumar et al., 2019 ✓ BERT Probe ✓ Ours GLU ✓ Hadiwinoto et al., 2019 LMMS Loureiro and Jorge, 2019 ✓ ✓ SVC Vial et al., 2019 ✓ GlossBERT Huang et al., 2019 ✓ ✓ Bi-encoder Model ( BEM ) Ours ✓ ✓

  24. Baselines and Prior Work Model Glosses? Pretraining? Source HCAN Luo et al., 2018a ✓ EWISE Kumar et al., 2019 ✓ BERT Probe ✓ Ours GLU ✓ Hadiwinoto et al., 2019 LMMS Loureiro and Jorge, 2019 ✓ ✓ SVC Vial et al., 2019 ✓ GlossBERT Huang et al., 2019 ✓ ✓ Bi-encoder Model ( BEM ) Ours ✓ ✓

  25. Overall WSD Performance 71.8 71.1 MFS baseline (65.5)

  26. Overall WSD Performance 73.7 71.8 71.1

  27. Overall WSD Performance 77.0 75.6 75.4 74.1 73.7 71.8 71.1

  28. Overall WSD Performance 79.0 77.0 75.6 75.4 74.1 73.7 71.8 71.1

  29. Performance by Sense Frequency

  30. Performance by Sense Frequency MFS Performance 94.9 94.1 93.5

  31. Performance by Sense Frequency MFS Performance LFS Performance 94.9 94.1 93.5 52.6 37.0 31.2

  32. Performance by Sense Frequency MFS Performance LFS Performance BEM gains come 94.9 94.1 93.5 almost entirely from LFS 52.6 37.0 31.2

  33. Zero-shot Evaluation ● BEM can represent new, unseen senses with gloss encoder and encode unseen words with the context encoder ● Probe baseline relies on WordNet back-off , predicting the most common 91.2 sense of unseen words as indicated in WordNet 84.9

  34. Zero-shot Evaluation Zero-shot Words 91.0 91.2 84.9

  35. Zero-shot Evaluation Zero-shot Words Zero-shot Senses 91.0 91.2 84.9 68.9 53.6

  36. Few-shot Learning of WSD Train BEM (and frozen probe baseline) on subset of SemCor, with (up to) k examples of each sense in the training data

  37. Few-shot Learning of WSD Train BEM (and frozen probe baseline) on subset of SemCor, with (up to) k examples of each sense in the training data

  38. Few-shot Learning of WSD Train BEM (and frozen probe baseline) on subset of SemCor, with (up to) k examples of each sense in the training data BEM at k=5 gets similar performance to full baseline

  39. Takeaways ● The BEM improves over the BERT probe baseline and prior approaches to using (1) sense definitions and (2) pretrained models for WSD

  40. Takeaways ● The BEM improves over the BERT probe baseline and prior approaches for using (1) sense definitions and (2) pretrained models for WSD ● Gains stem from better performance on less common and unseen senses

  41. Takeaways ● The BEM improves over the BERT probe baseline and prior approaches to using (1) sense definitions and (2) pretrained models for WSD ● Gains stem from better performance on less common and unseen senses https://github.com/facebookresearch/wsd-biencoders Questions? blvns@cs.washington.edu

Recommend


More recommend