commonsense knowledge in pre trained language models
play

Commonsense Knowledge in Pre-trained Language Models Vered - PowerPoint PPT Presentation

Commonsense Knowledge in Pre-trained Language Models Vered Shwartz July 5th, 2020 Commonsense Knowledge in If I lean on Ernie my back will Pre-trained hurt less Language Models Vered Shwartz July 5th, 2020 Commonsense Elmo will feel


  1. Commonsense Knowledge in Pre-trained Language Models Vered Shwartz July 5th, 2020

  2. Commonsense Knowledge in If I lean on Ernie my back will Pre-trained hurt less Language Models Vered Shwartz July 5th, 2020

  3. Commonsense Elmo will feel appreciated if I give him a Knowledge in If I lean on flower Ernie my back will Pre-trained hurt less Language Models Vered Shwartz July 5th, 2020

  4. om nom nom! Commonsense Elmo will feel appreciated if I give him a Knowledge in If I lean on flower Ernie my back will Pre-trained hurt less Language Models Vered Shwartz July 5th, 2020

  5. Do pre-trained LMs already capture commonsense knowledge? 5

  6. To fine-tune or not to fine-tune, that is the question 6

  7. To fine-tune or not to fine-tune, To fine-tune or not to fine-tune, that is the question that is the question Out-of-the box 7

  8. Knowledge-base Completion Converting KB relations to natural language templates and using LMs to query / score LMs: Templates: KBs: Conclusion: 8

  9. Knowledge-base Completion Converting KB relations to natural language templates and using LMs to query / score ● Petroni et al. (2019) : ○ ELMo / BERT LMs: ○ Hand-crafted templates Templates: ○ ConceptNet and Wikidata KBs: ○ BERT performs well but all models Conclusion: perform poorly on many-to-many relations 9

  10. Knowledge-base Completion Converting KB relations to natural language templates and using LMs to query / score ● Feldman et al. (2019) : ● Petroni et al. (2019) : ○ BERT ○ ELMo / BERT LMs: ○ Hand-crafted templates scored by GPT2 ○ Hand-crafted templates Templates: ○ ConceptNet, mining from Wikipedia ○ ConceptNet and Wikidata KBs: ○ Performs worse than supervised methods ○ BERT performs well but all models Conclusion: on ConceptNet but is more likely to perform poorly on many-to-many generalize to different domains relations 10

  11. Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? 11

  12. Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? A has fur. 12

  13. Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? A has fur. 13

  14. Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? A has fur. A has fur, is big, and has claws. 14

  15. Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? A has fur. A has fur, is big, and has claws. A has fur, is big, and has claws, has teeth, is an animal, ... 15

  16. Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? ● Good performance, RoBERTa > BERT ● Perceptual (e.g. visual) < non-perceptual (e.g. encyclopaedic or functional) - can’t be learned from texts alone ● Highly-ranked incorrect answers typically apply to a subset of properties 16

  17. Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? ● Good performance, RoBERTa > BERT ● Perceptual (e.g. visual) < non-perceptual (e.g. encyclopaedic or functional) - can’t be learned from texts alone ● Highly-ranked incorrect answers typically apply to a subset of properties 17

  18. Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? ● Good performance, RoBERTa > BERT ● Perceptual (e.g. visual) < non-perceptual (e.g. encyclopaedic or functional) - can’t be learned from texts alone ● Highly-ranked incorrect answers typically apply to a subset of properties 18

  19. Properties of Concepts (Weir et al., 2020) 2) Can pre-trained LMs be used to list the properties associated with given concepts? 19

  20. Properties of Concepts (Weir et al., 2020) 2) Can pre-trained LMs be used to list the properties associated with given concepts? Low correlation with human elicited properties, but coherent and mostly “verifiable by humans”. 20

  21. Can we trust knowledge from LMs? 21

  22. How well do LMs handle mutual exclusivity?* https://demo.allennlp.org/masked-lm 22

  23. LMs also generate fictitious facts! 23

  24. LMs also generate fictitious facts! Distributionally-related: 24

  25. LMs also generate fictitious facts! Distributionally-related: Syntactically-similar: 25

  26. Zero-shot LM-based Models for commonsense tasks 26

  27. Zero-shot setup 27

  28. Zero-shot setup P LM (The answer is answer_choice_1 ) P LM (The answer is answer_choice_2 ) ... P LM (The answer is answer_choice_k ) Language Model 28

  29. Zero-shot setup P LM ( answer_choice_1 | The answer is [MASK]) P LM (The answer is answer_choice_1 ) P LM ( answer_choice_2 | The answer is [MASK]) P LM (The answer is answer_choice_2 ) ... ... P LM ( answer_choice_k | The answer is [MASK]) P LM (The answer is answer_choice_k ) Masked Language Model Language Model 29

  30. Unsupervised Commonsense Question Answering with Self-Talk (Shwartz et al., 2020) Can we use LMs to generate required, missing or implicit knowledge for multiple choice commonsense question answering tasks? 30

  31. Model What do professors primarily do? teach courses. The main function of a professor’s teaching s₁₁ career is to teach students how they can improve their knowledge. What do professors primarily do? wear wrinkled tweed jackets. The main function of a s₁₂ min i (s i ₁) professor’s teaching career is to teach students how they can improve their knowledge. What do professors primarily do? teach courses. The main function of a professor's teaching s k ₁ min i (s i ₂) career and is to provide instruction in the subjects they teach. What do professors primarily do? wear wrinkled tweed jackets. The main function of a s k ₂ professor's teaching career and is to provide instruction in the subjects they teach. 31

  32. Generating Clarifications Question Generation What do professors primarily do? teach courses 32

  33. Generating Clarifications Question Generation What do professors primarily do? p₁ What is the main function of p₁ DistilGPT2 teach courses 33

  34. Generating Clarifications Clarification Generation Question Generation What do professors primarily do? What do professors primarily do? p₁ What is the main function of What is the main function of a professor’s teaching career? p₂ The main function of is a professor’s teaching career p₁ DistilGPT2 a professor’s teaching career? teach courses 34

  35. Generating Clarifications Clarification Generation Question Generation What do professors primarily do? What do professors primarily do? p₁ What is the main function of What is the main function of a professor’s teaching career? p₂ The main function of is a professor’s teaching career p₁ p₂ DistilGPT2 a professor’s teaching career? DistilGPT2 to teach students how they can improve their knowledge. The main function of a professor’s teaching career is to teach students how they can improve their knowledge. teach courses 35

  36. Knowledge-informed Model Generating clarifications from ConceptNet, Google Ngrams and COMET Taylor was doing her job so she put the money in the drawer. What will Taylor want to do next? 36

  37. Knowledge-informed Model Generating clarifications from ConceptNet, Google Ngrams and COMET Taylor was doing her job so she put the money in the drawer. What will Taylor want to do next? job, money type of work job motivated by goal money Job is a type of work. You would work because you want money. 37

  38. Knowledge-informed Model Generating clarifications from ConceptNet, Google Ngrams and COMET Taylor was doing her job so she put the money in the drawer. What will Taylor want to do next? job, money type of work job motivated job to earn money by goal money Job is a type of work. You would work because you want money. Job to earn money. 38

  39. Knowledge-informed Model Generating clarifications from ConceptNet, Google Ngrams and COMET Taylor was doing her job so she put the money in the drawer. What will Taylor want to do next? xWant job, money type of work job motivated job to earn money by goal money to keep the money in the drawer Job is a type of work. You would work because you want money. Job to earn money. As a result, Taylor wants to keep the money in the drawer. 39

  40. Unsupervised Commonsense Question Answering with Self-Talk ● Generating knowledge with LMs improve upon the baseline and performs similarly to knowledge-informed models. 40

Recommend


More recommend