how computer algorithms expose our hidden biases
play

How Computer Algorithms Expose Our Hidden Biases And How To Fix - PowerPoint PPT Presentation

How Computer Algorithms Expose Our Hidden Biases And How To Fix Them Victor Zimmermann LXIV. StuTS Computational Linguistics Department Heidelberg University The Shitstorm cometh. What happened? 1 lxiv. stuts | white man explains racism


  1. How Computer Algorithms Expose Our Hidden Biases And How To Fix Them Victor Zimmermann LXIV. StuTS Computational Linguistics Department Heidelberg University

  2. The Shitstorm cometh.

  3. What happened? 1 lxiv. stuts | white man explains racism

  4. The Netflix Artwork Controversy Why are the tabloids up in arms over Netflix adverts? Welcome to a stereotypical machine learning controversy . 2 lxiv. stuts | white man explains racism

  5. The algorithm “If the artwork representing a title captures something compelling to you, then it acts as a gateway into that title and gives you some visual ‘evidence’ for why the title might be good for you.” [Cha+17] Figure 1: Different artworks for romance and comedy viewers. 3 lxiv. stuts | white man explains racism

  6. Twitter outrage 4 lxiv. stuts | white man explains racism

  7. Netflix’ Response “We don’t ask members for their race, gender or ethnicity so we cannot use this information to personalise their individual Netflix experience. The only information we use is a member’s viewing history.” [Iqb18] 5 lxiv. stuts | white man explains racism

  8. Nobody expects the Patriarchy.

  9. Sources of Bias There are some obvious reasons for bias in machine learning: • Your training data is bad. • Your algorithm is bad. 6 • You are bad. And you should feel bad. lxiv. stuts | white man explains racism

  10. Bad Training Data

  11. Human Language Spoiler : All human language is biased. Bias in not necessarily performance based . [Tan90][GMS98] Instead it can also be encoded in orthography , lexicography or grammar of a language. • Asymmetrically marked gender (generic masculine, e.g. actor vs actress) • Naming conventions (e.g. Chastity vs. Bob) [Swe13] 1 Wikipedia lists 22 misogynistic and 5 misandric slurs. 7 • Quantity of gendered insults 1 [Sta77] lxiv. stuts | white man explains racism

  12. Word Embeddings Donnelly. The rally ended at about 3 Nurse Homemaker Paris What are Word Embeddings? Now you can do maths with words!? . . . p.m. and then spoke a rally at ... 8 Joe Obama spoke Sunday afternoon at a supporting Democrat U.S. Sen. Obama campaigned in Chicago and CHICAGO – Former President Barack Condensed mathematical representations of collocations. [Mik+13] northwest Indiana on Sunday, just tions. days ahead of Tuesday’s midterm elec- get-out-the-vote rally in Gary, Indiana, − − − − − → Obama ( 0 . 2 , 0 . 6 , ... ) − − − − → speaks ( 0 . 1 , 0 . 8 , ... ) − − − − − → Chicago ( 0 . 3 , 0 . 2 , ... ) ⇒ ⇒ − − − → press ( 0 . 0 , 0 . 5 , ... ) − − → Queen − − − → France = − − − → King − − − → Man + − − − − − → Woman = − − − − → Berlin − − − − − − − → Germany + − − − − → − − − − − − − − − → Programmer − − − → Man + − − − − − → Woman = − − − − − − − − → − − − − − → Surgeon − − − → Man + − − − − − → Woman = − − − → lxiv. stuts | white man explains racism

  13. Word Embeddings What are Word Embeddings used for? • Similarity Measures [Kus+15] • Machine Translation [Zou+13] • Sentence Classification [Kim14] • Part-of-Speech-Tagging [SZ14][RRZ18] • Dependency Parsing [CM14] • Semantic Modelling [Fu+14] • Coreference Resolution [Lee+17] Basically the entire field of Computational Linguistics. 9 lxiv. stuts | white man explains racism

  14. Mathematical Sledgehammer What if we just remove gender? Figure 2: Mind = Blown 10 lxiv. stuts | white man explains racism

  15. Mathematical Sledgehammer • Take “good” analogies, e.g. man-woman, he-she, king-queen, etc. • Extract some average “gender vector” from their embeddings. • Substract this new vector from all other relations. - Not applicable to most other kinds of bias. 11 lxiv. stuts | white man explains racism

  16. Mathematical Sledgehammer (in beautiful) being the means of the defining Word sets W , defining subsets n rows of SVD( C ), where Bias subspace B consists of the first k subsets. 12 with Words to neutralise N ∈ W , family of D 1 , D 2 , ..., D n ⊂ W , embedding equality sets ε := { E 1 , E 2 , ..., E m } , { w ∈ R d } w ∈ W , integer parameter k ≥ 1, E i ⊆ W , with reembedded words w ∈ N defined as ∑ µ i := w / | D i | w := ( w − w B ) / | w − w B | w ∈ D i . For each set E ∈ ε , let ∑ µ := w / | E | w ∈ E v := µ − µ B 1 − | v | 2 w B − µ B √ For each w ∈ E , w := v + ∑ ∑ C := ( w − µ i ) T ( w − µ i ) / | D i | . | w B − µ B | w ∈ D i i = 1 lxiv. stuts | white man explains racism

  17. Bad Algorithms

  18. Google’s Image Recognition Controversy Google automatically labels pictures according to their content. Problem: Their algorithm is bad. Source: @jackyalcine on Twitter 13 lxiv. stuts | white man explains racism

  19. Google’s Image Recognition Controversy Their solution: Source: www.theverge.com (visited on 2018-11-06) 14 lxiv. stuts | white man explains racism

  20. No easy solutions. Not one of these solutions is really good . • Total avoidance of problem. [Iqb18] • Limited applicability. [Bol+16] • Exploitation of false classification. [BGO16] • Introduction of even more priors and meta parameters. [Zha+17] 15 lxiv. stuts | white man explains racism

  21. Bad People

  22. Facebook Actual Quote from an actual Facebook Employee “We started out of a college dorm. I mean, c’mon, we’re Facebook. We never wanted to deal with this shit.” [Sha16] 16 lxiv. stuts | white man explains racism

  23. Facebook Possible cause of this apathy: (Don’t quote me on this.) 17 lxiv. stuts | white man explains racism

  24. Help, my Chatbot joined the KKK!

  25. Microsoft Tay 18 lxiv. stuts | white man explains racism

  26. Microsoft Tay 19 lxiv. stuts | white man explains racism

  27. Microsoft Tay What can we learn from this? • Tay is a chat bot.Tay is a chat bot. • Tay is down with the kids?Tay is down with the kids? • Tay learns from Twitter data. 20 lxiv. stuts | white man explains racism

  28. Microsoft Tay The absolutely expected happens... Source: www.theguardian.com (visited on 2018-11-19) 21 lxiv. stuts | white man explains racism

  29. What should you take away from this talk? • Just because something uses “machine learning” doesn’t mean it is unbiased. • All language is implicitly prejudiced. • Training data does make a difference. • Diverse staff makes a difference. • Testing your system makes a difference. 22 lxiv. stuts | white man explains racism

  30. What should you take away from this talk? Don’t listen to chat bots. They may act human. 23 lxiv. stuts | white man explains racism

  31. Appendix

  32. Language Classification Common language identification systems use extensive news corpora for training. + Big corpora in most languages. + Mostly unbiased “unbiased” texts. - Written in main dialect. - Privileged writing staff. Problem : African American English is 20% less likely to be classified as English than Standard English. [BO17] lxiv. stuts | white man explains racism

  33. Language Classification Solution by Blodgett, Green, and O’Connor (2016): 1. Use US Census data und geolocated tweets to estimate race of user, 2. Train classifier to identify “race” of a given tweet, based on high AA tweets from first set. Result: • Build new corpus from high AA tweets. • (Find out that “Asian” captures all foreign languages and use that fact for classification.) lxiv. stuts | white man explains racism

  34. References [Ang+16] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. “Machine bias: There’s software used across the country to ProPublica, May 23 (2016). [BGO16] Su Lin Blodgett, Lisa Green, and Brendan O’Connor. “Demographic dialectal variation in social media: A case study of African-American English”. In: arXiv preprint arXiv:1608.08868 (2016). predict future criminals. and it’s biased against blacks”. In: lxiv. stuts | white man explains racism

  35. References [BO17] pp. 183–186. issn: 10959203. arXiv: 1608.07187 . “Semantics derived automatically from language corpora Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. [CBN17] Programmer as Woman is to Homemaker? Debiasing Word Venkatesh Saligrama, and Adam Kalai. “Man is to Computer Tolga Bolukbasi, Kai-wei Chang, James Zou, [Bol+16] arXiv:1707.00061 (2017). Natural Language Processing: A Case Study of Social Media Su Lin Blodgett and Brendan O’Connor. “Racial Disparity in African-American English”. In: arXiv preprint Embeddings”. In: Nips (2016), pp. 1–9. contain human-like biases”. In: Science 356.6334 (2017), lxiv. stuts | white man explains racism

Recommend


More recommend