counterfactual data augmentation for mitigating gender
play

Counterfactual Data Augmentation for Mitigating Gender Stereotypes - PowerPoint PPT Presentation

Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology ACL 2019 Ran Zmigrod, Sebastian J. Mielke , Hanna Wallach, Ryan Cotterell University of Cambridge // Johns Hopkins University // Microsoft


  1. Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology ACL 2019 Ran Zmigrod, Sebastian J. Mielke , Hanna Wallach, Ryan Cotterell University of Cambridge // Johns Hopkins University // Microsoft Research rz279@cam.ac.uk sjmielke@jhu.edu wallach@microsoft.com rdc42@cam.ac.uk Twitter: @RanZmigrod – paper and thread pinned! // @sjmielke 1

  2. Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. 2

  3. Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. 2

  4. Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. 2

  5. Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. Both are possible... 2

  6. Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. (Rudinger et al., 2018; Both are possible... but systems prefer nurse! Zhao et al., 2018) 2

  7. Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. (Rudinger et al., 2018; Both are possible... but systems prefer nurse! Zhao et al., 2018) Word embeddings carry biases: 2

  8. This shouldn’t come as a surprise: our data is biased Google n-grams frequency counts he is a doctor she is a doctor 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 3

  9. Our focus: stereotypes in language modeling (Lu et al., 2018) stereotype Training data counts m f are visible as m He is a good doctor. He is a good nurse. likelihoods under a pronoun language model: f She is a good doctor. She is a good nurse. 4

  10. Our focus: stereotypes in language modeling (Lu et al., 2018) stereotype Training data counts m f are visible as m He is a good doctor. He is a good nurse. likelihoods under a pronoun language model: f She is a good doctor. She is a good nurse. The solution: C ounterfactual D ata A ugmentation (Lu et al., 2018) 4

  11. Our focus: stereotypes in language modeling (Lu et al., 2018) stereotype Training data counts m f are visible as m He is a good doctor. He is a good nurse. likelihoods under a pronoun language model: f She is a good doctor. She is a good nurse. For every sentence with she / he : The solution: e.g., “She is a nurse.” C ounterfactual D ata A ugmentation (Lu et al., 2018) 4

  12. Our focus: stereotypes in language modeling (Lu et al., 2018) stereotype Training data counts m f are visible as m He is a good doctor. He is a good nurse. likelihoods under a pronoun language model: f She is a good doctor. She is a good nurse. For every sentence with she / he : The solution: e.g., “She is a nurse.” C ounterfactual add that sentence with he / she for training: D ata e.g., “He is a nurse.” A ugmentation (Lu et al., 2018) 4

  13. Our focus: stereotypes in language modeling (Lu et al., 2018) stereotype Training data counts m f are visible as m He is a good doctor. He is a good nurse. likelihoods under a pronoun language model: f She is a good doctor. She is a good nurse. For every sentence with she / he : The solution: e.g., “She is a nurse.” C ounterfactual add that sentence with he / she for training: D ata e.g., “He is a nurse.” A ugmentation Now they should yield a balanced model! (Lu et al., 2018) 4

  14. “Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. 6

  15. “Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. 6

  16. “Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. 6

  17. “Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. 6

  18. “Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. So, uh, can we just... change all words’ grammatical gender? 6

  19. “Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. So, uh, can we just... change all words’ grammatical gender? Example: Der Arzt sitzt auf einem Stuhl (The male doctor sits on a chair) 6

  20. “Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. So, uh, can we just... change all words’ grammatical gender? Example: Der Arzt sitzt auf einem Stuhl (The male doctor sits on a chair) Swap all: Die Ärztin sitzt auf einer Stuhl 6

  21. “Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. So, uh, can we just... change all words’ grammatical gender? Example: Der Arzt sitzt auf einem Stuhl (The male doctor sits on a chair) Swap all: Die Ärztin sitzt auf einer Stuhl (The female doctor sits on a... what?) No, what we need is... 6

  22. Syntax to the rescue: use dependency parses gute Der Arzt sitzt auf einem Stuhl 7

  23. Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! gute Der Arzt sitzt auf einem Stuhl 7

  24. Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! Build a MRF over morphological tags along the dependency parse! 3P; M ; SG; M ; SG; M ; SG; - M ; SG; M ; SG; SG; NOM NOM NOM DAT DAT PRS 7

  25. Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! Build a MRF over morphological tags along the dependency parse! d learned from data, o r o n c c e n t e e m / a g r s a c t o r u r a l f n e 3P; M ; SG; M ; SG; M ; SG; - M ; SG; M ; SG; SG; NOM NOM NOM DAT DAT PRS 7

  26. Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! Build a MRF over morphological tags along the dependency parse! d learned from data, o r o n c c e n t e e m / a g r s a c t o r u r a l f n e 3P; M ; SG; M ; SG; M ; SG; - M ; SG; M ; SG; SG; NOM NOM NOM DAT DAT PRS m a n u a l d a g not learned, boosts tags that stay m p e n i n what they were before intervention 7

  27. Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! Build a MRF over morphological tags along the dependency parse! d learned from data, o r o n c c e n t e e m / a g r s a c t o r u r a l f n e 3P; M ; SG; M ; SG; F ; SG; - M ; SG; M ; SG; SG; NOM NOM NOM DAT DAT PRS m a n u a l d a g not learned, boosts tags that stay m p e n i n what they were before intervention 7

  28. Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! Build a MRF over morphological tags along the dependency parse! d learned from data, o r o n c c e n t e e m / a g r s a c t o r u r a l f n e 3P; F ; SG; F ; SG; F ; SG; - M ; SG; M ; SG; SG; NOM NOM NOM DAT DAT PRS m a n u a l d a g not learned, boosts tags that stay m p e n i n what they were before intervention 7

  29. Recap: what is a Markov Random Field (Koller and Friedman, 2009) ? Model p ( x , y , z ) by decomposing into factors ( )! y x z 8

  30. Recap: what is a Markov Random Field (Koller and Friedman, 2009) ? Model p ( x , y , z ) by decomposing into factors ( )! y x z 8

  31. Recap: what is a Markov Random Field (Koller and Friedman, 2009) ? Model p ( x , y , z ) by decomposing into factors ( )! Every factor gives a score to certain assignments: ( x = 2, y = 1 ) = 0.42 ( y = 1 ) = 1.3 y x ( z = 1 ) = − 1 z 8

  32. Recap: what is a Markov Random Field (Koller and Friedman, 2009) ? Model p ( x , y , z ) by decomposing into factors ( )! Every factor gives a score to certain assignments: ( x = 2, y = 1 ) = 0.42 ( y = 1 ) = 1.3 y x ( z = 1 ) = − 1 Add up all factors to obtain global score: score ( x = 2, y = 1, z = 4 ) = z ( x = 2, y = 1 ) + ( y = 1 ) + ( z = 4 ) 8

  33. Recap: what is a Markov Random Field (Koller and Friedman, 2009) ? Model p ( x , y , z ) by decomposing into factors ( )! Every factor gives a score to certain assignments: ( x = 2, y = 1 ) = 0.42 ( y = 1 ) = 1.3 y x ( z = 1 ) = − 1 Add up all factors to obtain global score: score ( x = 2, y = 1, z = 4 ) = z ( x = 2, y = 1 ) + ( y = 1 ) + ( z = 4 ) Get p by global normalization (easy in trees): p ( x = 2, y = 1, z = 4 ) ∝ expscore ( x = 2, y = 1, z = 4 ) 8

Recommend


More recommend