don t take the premise for granted
play

Dont Take the Premise for Granted: Mitigating Artifacts in Natural - PowerPoint PPT Presentation

Dont Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference Yonatan Belinkov *, Adam Poliak*, Stuart Shieber, Benjamin Van Durme, Alexander Rush July 29, 2019 ACL, Florence NLU as Relationship Identification


  1. Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference Yonatan Belinkov *, Adam Poliak*, Stuart Shieber, Benjamin Van Durme, Alexander Rush July 29, 2019 ACL, Florence

  2. NLU as Relationship Identification Natural language inference (entailment) Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction [Sources: Hill+ ‘16, Zhang+ ‘16]

  3. NLU as Relationship Identification Natural language inference (entailment) Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction [Sources: Hill+ ‘16, Zhang+ ‘16]

  4. NLU as Relationship Identification Natural language inference (entailment) Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction Reading comprehension “No,” he replied, “except that he seems in a great hurry.” “That’s just it,” Jimmy returned promptly. “Did you ever see him hurry unless he was frightened?” Peter confessed that he never had. Q: “Well, he isn’t now, yet just look at him go” A: Do, case, confessed, frightened, mean, replied, returned, said, see, thought [Sources: Hill+ ‘16, Zhang+ ‘16]

  5. NLU as Relationship Identification Natural language inference (entailment) Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction Reading comprehension Visual question answering “No,” he replied, “except that he seems in a great hurry.” “That’s just it,” Jimmy returned promptly. “Did you ever see him hurry unless he was frightened?” Peter confessed that he never had. Q: “Well, he isn’t now, yet just look at him go” A: Do, case, confessed, frightened, mean, replied, returned, said, see, thought Q: Is the girl walking the bike? A: Yes, No [Sources: Hill+ ‘16, Zhang+ ‘16]

  6. NLU as Relationship Identification Natural language inference (entailment) Premise: A woman is running in the park with her dog Hypothesis: A woman is sleeping Relation: entailment, neutral, contradiction Assumption: Identifying the relationship requires Reading comprehension Visual question answering deep language understanding “No,” he replied, “except that he seems in a great hurry.” “That’s just it,” Jimmy returned promptly. “Did you ever see him hurry unless he was frightened?” Peter confessed that he never had. Q: “Well, he isn’t now, yet just look at him go” A: Do, case, confessed, frightened, mean, replied, returned, said, see, thought Q: Is the girl walking the bike? A: Yes, No [Sources: Hill+ ‘16, Zhang+ ‘16]

  7. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18)

  8. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) Hypothesis: A woman is sleeping

  9. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) Premise: Hypothesis: A woman is sleeping

  10. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) Premise: Hypothesis: A woman is sleeping entailment neutral contradiction

  11. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) Premise: Hypothesis: A woman is sleeping entailment neutral contradiction

  12. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) 100 80 60 40 20 0 SNLI Multi-NLI Majority Hypothesis-Only InferSent

  13. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) 100 80 60 40 20 0 SNLI Multi-NLI Majority Hypothesis-Only InferSent

  14. One-Sided Biases • Hypothesis-only NLI (Poliak+ ‘18; Gururangan+ ’18; Tsuchia ‘18) • Reading comprehension (Kaushik & Lipton ‘18) • Visual question answering (Zhang+ ’16; Kafle & Kanan ’16; Goyal+ ’17; Agarwal+ ’17; inter alia ) • Story cloze completion (Schwartz+ ‘17, Cai+ ’17)

  15. Problem: One-sided biases mean that models may not learn the true relationship between premise and hypothesis 15

  16. Strategies for dealing with dataset bias • Construct new datasets (Sharma+ ‘18) o $$$ o Other bias

  17. Strategies for dealing with dataset bias • Construct new datasets (Sharma+ ‘18) o $$$ o Other bias • Filter “easy” examples (Gururangan+ ‘18) o Hard to scale o May still have biases (see SWAG → BERT → HellaSWAG)

  18. Strategies for dealing with dataset bias • Construct new datasets (Sharma+ ‘18) o $$$ o Other bias • Filter “easy” examples (Gururangan+ ‘18) o Hard to scale o May still have biases (see SWAG → BERT → HellaSWAG) • Forgo datasets with known biases o Not all bias is bad o Biased datasets may have other useful information

  19. Our approach: Design models that facilitate learning less biased representations

  20. <latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit> A Generative Perspective ● Typical NLI models maximize the discriminative likelihood p θ ( y | P, H )

  21. <latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit> A Generative Perspective ● Typical NLI models maximize the discriminative likelihood p θ ( y | P, H ) g g – classifier f P , f H – encoders f P f H P H

  22. <latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit> <latexit sha1_base64="3ROWeDUn/KIih3NPtU61pkoOB8E=">AB8nicbVBNS8NAEN3Ur1q/qh69LBahgpSkCnoseumxgv2ANJTNdtMu3WTD7kQIsT/DiwdFvPprvPlv3LY5aOuDgcd7M8zM82PBNdj2t1VYW9/Y3Cpul3Z29/YPyodHS0TRVmbSiFVzyeaCR6xNnAQrBcrRkJfsK4/uZv53UemNJfRA6Qx80IyinjAKQEjuXG1hZ9weoGb54Nyxa7Zc+BV4uSkgnK0BuWv/lDSJGQRUEG0dh07Bi8jCjgVbFrqJ5rFhE7IiLmGRiRk2svmJ0/xmVGOJDKVAR4rv6eyEiodRr6pjMkMNbL3kz8z3MTCG68jEdxAiyi0VBIjBIPsfD7liFERqCKGKm1sxHRNFKJiUSiYEZ/nlVdKp15zLWv3+qtK4zeMohN0iqrIQdeogZqohdqIome0St6s8B6sd6tj0VrwcpnjtEfWJ8/BUuPxg=</latexit> A Generative Perspective ● Typical NLI models maximize the discriminative likelihood p θ ( y | P, H ) ● Our key idea: If we generate the premise, it cannot be ignored ● We will maximize the likelihood of generating the premise p ( P | y, H )

  23. <latexit sha1_base64="ocm8C5Mno5mx1XkO6NnTvFvtHxM=">AB+3icbVDLSsNAFJ3UV62vWJduBotQUpSBV0W3XRZwT6gDWEynbRDJw9mbsQ+ytuXCji1h9x5984bPQ1gMXDufcy73eLHgCizr2yisrW9sbhW3Szu7e/sH5mG5o6JEUtamkYhkzyOKCR6yNnAQrBdLRgJPsK43uZ353QcmFY/Ce0hj5gRkFHKfUwJacs1y7A5gzIBU/yEW+e4eaFatmzYFXiZ2TCsrRcs2vwTCiScBCoIo1betGJyMSOBUsGlpkCgWEzohI9bXNCQBU042v32KT7UyxH4kdYWA5+rviYwESqWBpzsDAmO17M3E/7x+Av61k/EwToCFdLHITwSGCM+CwEMuGQWRakKo5PpWTMdEgo6rpIOwV5+eZV06jX7ola/u6w0bvI4iugYnaAqstEVaqAmaqE2ougRPaNX9GZMjRfj3fhYtBaMfOYI/YHx+QPtiJMO</latexit> <latexit sha1_base64="3ROWeDUn/KIih3NPtU61pkoOB8E=">AB8nicbVBNS8NAEN3Ur1q/qh69LBahgpSkCnoseumxgv2ANJTNdtMu3WTD7kQIsT/DiwdFvPprvPlv3LY5aOuDgcd7M8zM82PBNdj2t1VYW9/Y3Cpul3Z29/YPyodHS0TRVmbSiFVzyeaCR6xNnAQrBcrRkJfsK4/uZv53UemNJfRA6Qx80IyinjAKQEjuXG1hZ9weoGb54Nyxa7Zc+BV4uSkgnK0BuWv/lDSJGQRUEG0dh07Bi8jCjgVbFrqJ5rFhE7IiLmGRiRk2svmJ0/xmVGOJDKVAR4rv6eyEiodRr6pjMkMNbL3kz8z3MTCG68jEdxAiyi0VBIjBIPsfD7liFERqCKGKm1sxHRNFKJiUSiYEZ/nlVdKp15zLWv3+qtK4zeMohN0iqrIQdeogZqohdqIome0St6s8B6sd6tj0VrwcpnjtEfWJ8/BUuPxg=</latexit> A Generative Perspective ● Typical NLI models maximize the discriminative likelihood p θ ( y | P, H ) ● Our key idea: If we generate the premise, it cannot be ignored ● We will maximize the likelihood of generating the premise p ( P | y, H ) Hypothesis: A woman is sleeping Premise: A woman is running in Relation: contradiction the park with her dog

  24. A Generative Perspective ● Unfortunately, text generation is hard! Hypothesis: A woman is sleeping Premise: A woman is running in Relation: contradiction the park with her dog

  25. A Generative Perspective ● Unfortunately, text generation is hard! Hypothesis: A woman is sleeping Premise: A woman is running in Relation: contradiction the park with her dog Premise: A woman sings a song while playing piano Premise: This woman is laughing at her baby …

Recommend


More recommend