understanding the origins of bias in word embeddings
play

Understanding the Origins of Bias in Word Embeddings Marc-Etienne - PowerPoint PPT Presentation

Understanding the Origins of Bias in Word Embeddings Marc-Etienne Brunet Colleen Alkalay-Houlihan Ashton Anderson Richard Zemel Introduction Graduate student at U of T (Vector Institute) Algorithmic NLP Bias Work at the intersection of


  1. Understanding the Origins of Bias in Word Embeddings Marc-Etienne Brunet Colleen Alkalay-Houlihan Ashton Anderson Richard Zemel

  2. Introduction Graduate student at U of T (Vector Institute) Algorithmic NLP Bias Work at the intersection of Bias, Explainability, and Natural Language Processing Collaborated with Colleen Alkalay-Houlihan Explainability Supervised by Ashton Anderson and Richard Zemel

  3. Many Forms of Algorithmic Bias For example: Facial Recognition ● Automated Hiring ● Criminal Risk Assessment ● Word Embeddings ●

  4. Many Forms of Algorithmic Bias For example: Facial Recognition ● Automated Hiring ● Criminal Risk Assessment ● Word Embeddings ●

  5. How can we attribute the bias in word embeddings to the individual documents in their training corpora?

  6. > Background Method Overview Critical Details Experiments

  7. Word Embeddings: Definitions in Vector Space Definitions encode relationships between words cleaner leader cleaning leading

  8. Word Embeddings: Definitions in Vector Space Definitions encode relationships between words cleaner lead er cleaning lead ing

  9. Word Embeddings: Definitions in Vector Space Definitions encode relationships between words clean er lead er role action clean ing lead ing

  10. Problematic Definitions in Vector Space Definitions encode relationships between words cleaner leader woman man

  11. Problematic Definitions in Vector Space Definitions encode relationships between words cleaner man male female leader woman man a woman Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai (NeurIPS 2016)

  12. Measuring Bias in Word Embeddings T = cleaner How can we measure bias S = leader in word embeddings? B = woman A = man

  13. Measuring Bias in Word Embeddings I mplicit A ssociation T est (IAT) T = cleaner T = cleaner S = leader S = leader B = woman B = woman A = man A = man

  14. Measuring Bias in Word Embeddings I mplicit A ssociation T est (IAT) T = cleaner S = leader B = woman A = man

  15. Measuring Bias in Word Embeddings I mplicit A ssociation T est (IAT) T = cleaner S = leader B = woman A = man

  16. Measuring Bias in Word Embeddings I mplicit A ssociation T est (IAT) T = cleaner T = cleaner S = leader S = leader W ord E mbedding A ssociation T est (WEAT) B = woman B = woman Association S,A ≈ Σ S,A cos(s,a) A = man A = man Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan (Science 2017)

  17. Measuring Bias WEAT on popular corpora matches IAT study results IAT WEAT Target Words Attribute Words effect size p-val effect size p-val Flowers v.s. Insects Pleasant v.s. Unpleasant 1.35 1.0E-08 1.5 1.0E-07 Math v.s. Arts Male v.s. Female Terms 0.82 1.0E-02 1.06 1.8E-02 ... ... ... ... Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan (Science 2017)

  18. Measuring Bias WEAT on popular corpora matches IAT study results IAT WEAT Target Words Attribute Words effect size p-val effect size p-val Flowers v.s. Insects Pleasant v.s. Unpleasant 1.35 1.0E-08 1.5 1.0E-07 Math v.s. Arts Male v.s. Female Terms 0.82 1.0E-02 1.06 1.8E-02 ... ... ... ... “ Semantics derived automatically from language corpora contain human-like biases” Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan (Science 2017)

  19. Background > Method Overview Critical Details Experiments

  20. How can we attribute the bias in word embeddings to the individual documents in their training corpora?

  21. From Word2Bias Male Female GloVe WEAT Career Doc n Family { w i } = w(X) X : Corpus B(w(X)) Word Embedding (e.g. Wikipedia) Bias Measured

  22. Differential Bias Idea: Consider the differential contribution of each document ∆B Doc k Doc n removal

  23. Bias Attributed Differential Bias Document ID ∆B 1 -0.0014 2 0.0127 ∆B Doc k ... ... Doc n k 0.0374 ... ... n 0.0089

  24. Analyse Metadata? Differential Bias Document ID ∆B Year Author 1 -0.0014 2 0.0127 Doc k ... ... Doc n k 0.0374 ? ? ... ... n 0.0089

  25. Bias Gradient Male Female GloVe WEAT Career Doc n Family { w i } = w(X) X : Corpus B(w(X)) Word Embedding (e.g. Wikipedia) Bias Measured

  26. Bias Gradient Male Female GloVe WEAT Career Doc n Family { w i } = w(X) X : Corpus B(w(X)) Word Embedding (e.g. Wikipedia) Bias Measured

  27. Background Method Overview > Critical Details Experiments

  28. Computing the Components Fast & Easy: Math, Automatic Differentiation, or two evaluations of B(w). Slow & Hard: Differentiate through an entire training procedure: - Leave-one-out retraining? ( time-bound ) - Backprop? ( memory-bound ) - Approximate using Influence Functions Koh & Liang (ICML 2017)

  29. Computing the Components Fast & Easy: Math, Automatic Differentiation, or two evaluations of B(w). Slow & Hard: Differentiate through an entire training procedure: - Leave-one-out retraining? ( time-bound ) - Backprop? ( memory-bound ) - Approximate using Influence Functions Koh & Liang (ICML 2017)

  30. Computing the Components Fast & Easy: Math, Automatic Differentiation, or two evaluations of B(w). Slow & Hard: Differentiate through an entire training procedure: - Leave-one-out retraining? ( time-bound ) - Backprop? ( memory-bound ) - Approximate using Influence Functions Koh & Liang (ICML 2017)

  31. Influence Functions Give us a way to approximate the change in model parameters model parameters: θ new model params: θ̃ ≈ infl_func( θ , ∆X) perturb training data by ∆X

  32. Influence Functions Inverse Hessian (GloVe: 2VD x 2VD matrix) 2VD can easily be > 10 9

  33. Applying Influence Functions to GloVe GloVe Loss : other params ( treat as const ) word vectors

  34. Applying Influence Functions to GloVe Gradient of Pointwise Loss Hessian becomes block diagonal! ( V Blocks of D by D ) Allows us to apply influence function approximation to one word vector at a time!

  35. Algorithm: Compute Differential Bias WEAT words

  36. Algorithm: Compute Differential Bias WEAT words

  37. Algorithm: Compute Differential Bias WEAT words

  38. Algorithm: Compute Differential Bias WEAT words

  39. Algorithm: Compute Differential Bias WEAT words

  40. Background Method Overview Critical Details > Experiments

  41. Objectives of Experiments 1. Assess the accuracy of our influence function approximation 2. Identify and analyse most bias impacting documents

  42. WEAT S = Science T = Arts S = Instruments T = Weapons A = Male B = Female A = Pleasant B = Unpleasant Corpora

  43. Differential Bias Differential Bias (%)

  44. Differential Bias log Differential Bias (%)

  45. Differential Bias

  46. Differential Bias 1 doc ≈ 0.00007% of corpus increase bias by 0.35%! Differential Bias (%)

  47. Approximated WEAT Ground Truth WEAT

  48. Baseline Bias (no removals) Approximated WEAT (0.7% of corpus) Removal of bias increasing docs Removal of bias increasing docs Ground Truth WEAT

  49. Approximated WEAT Baseline Bias Removal of bias increasing docs Ground Truth WEAT

  50. Approximated WEAT Baseline Bias Removal of bias increasing docs Ground Truth WEAT

  51. Approximated WEAT Baseline Bias (0.7% of corpus) Removal of bias increasing docs Ground Truth WEAT

  52. Document Impact Generalizes WEAT 1 (Science v.s. Arts Gender Bias) remove bias baseline remove bias increasing docs (no removals) decreasing docs GloVe -1.27 1.14 1.7 word2vec 0.11 1.35 1.6 Removal of documents also affects word2vec , and other metrics!

  53. Limitations & Future Work Consider multiple biases at simultaneously ● Use metrics that depend on more words ● Consider bias in downstream tasks where embeddings are used ● Does this carry over to BERT ? ●

  54. Recap cleaner leader Bias can be quantified; correlates with ● woman man known human biases We can identify the documents that most ● Doc k impact bias, and approximate impact Doc n These documents are qualitatively ● meaningful, and impact generalizes

  55. Thank you! Poster # 146 Marc Colleen mebrunet@cs.toronto.edu arXiv: 1810.03611 Ashton Rich

  56. References T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, and A. Kalai. Man is to computer programmer as ● woman is to homemaker? debiasing word embeddings. In 30th Conference on Neural Information Processing Systems (NIPS), 2016. A. Caliskan, J. J. Bryson, and A. Narayanan. Semantics derived automatically from language ● corpora contain human-like biases. Science, 356(6334):183–186, 2017. P. W. Koh and P. Liang. Understanding Black-box Predictions via Influence Functions. In ● Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1885–1894, 2017.

  57. Measuring Bias “...results raise the possibility that all implicit human biases are reflected in the statistical properties of language.” Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan (Science 2017)

  58. Impact on Word2Vec Removal of Documents Identified by our Method Decrease (0.7%) Baseline Increase (0.7%) GloVe -1.27 1.14 1.7 word2vec 0.11 1.35 1.6

Recommend


More recommend