Measuring Non-Expert Comprehension of Machine Learning Fairness - PowerPoint PPT Presentation

Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics Debjani Saha , Candice Schumann, Duncan C. McElfresh, John P. Dickerson, Michelle L. Mazurek, Michael Carl Tschantz 37th International Conference on Machine Learning (ICML) July 12-18th, 2020 1

Motivation 2

Fairness in ML is a growing issue ● Plenty of current news articles on bias in machine learning ● Many companies are focusing on bias, fairness, and explainability ○ Google What-If Tool ○ IBM AI Fairness 360 ○ NSF Program on Fairness in AI in Collaboration with Amazon ● Technical solutions are being pursued... Berkeley CS294 slides: Fairness in Machine Learning: CS 294 3

How is ML fairness defined? Many fairness definitions are developed by ML experts using lots of math... ● Statistical parity ● Accuracy/error rates ● Causality 4

Who ultimately uses ML fairness? Many fairness definitions are developed by ML experts using lots of math... ● Statistical parity ● Accuracy/error rates ● Causality … but are largely used by and impact non-ML experts in diverse settings including: ● Hiring ● Education ● Criminal justice ● … 5

What needs to be done? How can we decide which definitions are appropriate in different real-world settings, if any? 6

Our Contribution How can we decide which definitions are appropriate in different real-world settings, if any? Does the general public understand mathematical definitions of ML fairness and their behavior in real-world settings? 7

Why non-experts? ● Understand how people who will be impacted by ML decisions perceive these fairness definitions 8

Why non-experts? ● Understand how people who will be impacted by ML decisions perceive these fairness definitions ● Importance of considering all stakeholders 9

Research Questions Can we develop a metric to measure lay understanding of ML fairness definitions? 10

Research Questions Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications? 11

Research Questions Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications? ● What factors play a role in comprehension? 12

Research Questions Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications? ● What factors play a role in comprehension? ● How are comprehension and sentiment related? 13

Survey Design We assess the following ML fairness definitions in our survey: ● Demographic parity ● Equal opportunity (FPR, FNR) ● Equalized odds 14

Demographic Parity P(Y | A=0) = P(Y | A=1) 15

Equal Opportunity (FPR) P(Ŷ=1 | A=0, Y=0) = P(Ŷ=1 | A=1, Y=0) 16

Equal Opportunity (FNR) P(Ŷ=0 | A=0, Y=1) = P(Ŷ=0 | A=1, Y=1) 17

Equalized Odds P(Ŷ=0 | A=0, Y=1) = P(Ŷ=0 | A=1, Y=1) P(Ŷ=1 | A=0, Y=0) = P(Ŷ=1 | A=1, Y=0) 18

Survey Design Participants are presented with a decision-making scenario , along with a rule to ensure that the decisions are made fairly 19

Survey Design Participants are presented with a decision-making scenario , along with a rule to ensure that the decisions are made fairly “ A hiring manager at a new sales company is reviewing 100 new job applications.” 20

Survey Design Participants are presented with a decision-making scenario , along with a rule to ensure that the decisions are made fairly “A hiring manager at a new sales company is reviewing 100 new job applications.” “The fraction of applicants who receive job offers that are female should equal the fraction of applicants that are female. Similarly, fraction of applicants who receive job offers that are male should equal the fraction of applicants that are male.” 21

Survey Design Participants are presented with a decision-making scenario , along with a rule to ensure that the decisions are made fairly demographic parity “A hiring manager at a new sales company is reviewing 100 new job applications.” “The fraction of applicants who receive job offers that are female should equal the fraction of applicants that are female. Similarly, fraction of applicants who receive job offers that are male should equal the fraction of applicants that are male.” 22

Survey Design Survey contains 18 questions: 23

Survey Design Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 24

Survey Design Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 25

Survey Design Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 2 self-report questions on participant understanding and use of the rule 26

Survey Design Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 2 self-report questions on participant understanding and use of the rule 2 self-report questions on participant liking of and agreement with the rule 27

Survey Design Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 2 self-report questions on participant understanding and use of the rule 2 self-report questions on participant liking of and agreement with the rule 3 free-response questions on comprehension and opinion of the rule 28

Survey Design Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 2 self-report questions on participant understanding and use of the rule 2 self-report questions on participant liking of and agreement with the rule 3 free-response questions on comprehension and opinion of the rule 29

Survey Design Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 2 self-report questions on participant understanding and use of the rule 2 self-report questions on participant liking of and agreement with the rule 3 free-response questions on comprehension and opinion of the rule COMPREHENSION SCORE 30

Participant Demographics 349 participants Recruited through a web panel to approximate US distributions on race, age, gender, and education (2017 census) 31

Research Question 1 Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications? ● What factors play a role in comprehension? ● How are comprehension and sentiment related? 32

Our metric effectively measures comprehension We confirm this using two different measures… 33

Our metric effectively measures comprehension “In your own words, explain the rule.” 34

Our metric effectively measures comprehension “What did you use to answer the questions?” 39

Our metric effectively measures comprehension “What did you use to answer the questions?” 40

Our metric effectively measures comprehension We confirm this using two different measures… 1. Greater ability to explain the rule is associated with higher comprehension score 2. Self-reported compliance with the rule is associated with higher comprehension score 41

Research Question 2a Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications? ● What factors play a role in comprehension? ● How are comprehension and sentiment related? 42

Education predicts performance Higher education is associated with higher comprehension score 43

Fairness definition predicts performance Equal opportunity (FNR) was associated with lower comprehension score 44

Fairness definition predicts performance Equal opportunity (FNR) was associated with lower comprehension score 45

Comprehension Comprehension is best predicted by two factors 1. Higher education level (Bachelor’s and above) predicts better comprehension 2. Fairness definition itself can affect comprehension (participants whose survey focused on FNR had lower comprehension) 46

Research Question 2b Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications? ● What factors play a role in comprehension? ● How are comprehension and sentiment related? 47

Measuring Non-Expert Comprehension of Machine Learning Fairness - PowerPoint PPT Presentation

Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics Debjani Saha , Candice Schumann, Duncan C. McElfresh, John P. Dickerson, Michelle L. Mazurek, Michael Carl Tschantz 37th International Conference on Machine Learning (ICML)

Representa)on Learning for Reading Comprehension Russ Salakhutdinov Machine Learning Department

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning Measuring

Measuring model performance or error Introduction to Machine Learning Is our model any good?

Machine learning and the expert in the loop Mich` ele Sebag TAO ECAI 2014, Frontiers of AI 1 /

COMPREHENSION Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi Presenter: Wenda

Expert assessment vs. machine learning algorithms: juvenile criminal recidivism in Catalonia

Learning Large-Scale Multimodal Data Streams Ranking, Mining, and Machine Comprehension Winston

Using Natural Language Relations between Answer Choices for Machine Comprehension Rajkumar Pujari

Learning and Knowledge Transfer with Memory Networks for Machine Comprehension Mohit Yadav.

Speech Question Answering TOEFL Listening Comprehension Test by Machine Wei Fang December 13,

Learning and Knowledge Transfer with Memory Networks for Machine Comprehension Mohit Yadav

1 Learning for WSD WSD line Corpus Assume part-of-speech (POS), e.g. noun, verb,

An Exercise in An Exercise in Machine Learning Machine Learning

Machine Learning By Alex Scarlatos What is Machine Learning? Machine Learning is the process by

Machine Learning: Study of algorithms that improve their performance P at some task T

Evaluation Metrics for Machine Reading Comprehension (RC): Prerequisite Skills and Readability

Traditional Machine Learning: Unsupervised Learning Juhan Nam Traditional Machine Learning

Making and Measuring Progress in Adversarial Machine Learning Nicholas Carlini Google Research

CS 335 Machine Learning What is Machine Learning? Dan Sheldon Spring 2019 What is Machine

Cheap Tricks and the Perils of Machine Learning Percy Liang Stanford / (Semantic Machines /

SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang,

Machine Learning Machine Learning: algorithms that use experience to improve their

Machine Comprehension with Discourse Relations Karthik Narasimhan Regina Barzilay CSAIL,

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine