Fairness in ML 2: Equal opportunity and odds Privacy & Fairness in Data Science CS848 Fall 2019 Slides adapted from https://fairmlclass.github.io/4.html
2 Outline • Recap: Disparity impact – Issues with Disparate Impact • Observational measure of fairness – Equal opportunity and Equalized odds – Predictive Value Parity – Tradeoff • Achieving Equalized Odds – Binary Classifier
Recap: Disparate Impact • Let D = (X, Y, C ) be a labeled data set, where X = 0 means protected, C = 1 is the positive class (e.g., admitted), and Y is everything else. • We say that a classifier f has disparate impact (DI) of 𝜐 (0 < 𝜐 < 1) if: Pr 𝑔 𝑍 = 1 𝑌 = 0) Pr(𝑔 𝑍 = 1 | 𝑌 = 1) ≤ 𝜐 that is, if the protected class is positively classified less than 𝜐 times as often as the unprotected class. (legally, 𝜐 = 0.8 is common).
4 Recap: Disparate Impact X (protected attribute) Y (features) f(Y) (prediction) X1 … … … … Race Bail 0 … 0 1 … 1 1 (Y) 1 … 1 0 … 1 0 (N) 1 … 1 0 … 0 0 (N) protected group .. … … … … … … 𝑄 016 𝐹 = Pr[𝐹|𝑌 = 0] 𝑄 012 𝐹 = Pr[𝐹|𝑌 = 1]
5 Recap: Disparate Impact X (protected attribute) Y (features) f(Y) (prediction) X1 … … … … Race Bail 0 … 0 1 … 1 1 (Y) 1 … 1 0 … 1 0 (N) 1 … 1 0 … 0 0 (N) protected group .. … … … … … … 𝑄 016 𝑔 𝑍 = 1 Classifier f has DI of 𝜐 : 012 [𝑔 𝑍 = 1] ≤ 𝜐 𝑄
Demographic parity (or the reverse of disparate impact) • Definition. Classifier f satisfies demographic parity if f is independent of X • When f is binary 0/1-variables, this means, for all groups 𝑦 and 𝑦′, 𝑄 01= 𝑔 𝑍 = 1 = 𝑄 01= > 𝑔 𝑍 = 1 • Approximate versions: ? @AB C D 12 ? @AB> [C D 12] ≥ 1 − 𝜗 – 01= 𝑔 𝑍 = 1 − 𝑄 01= > 𝑔 𝑍 = 1 ≤ 𝜗 𝑄 –
7 Demographic parity Issues C = 1 X = 1 X = 0
8 Demographic parity Issues ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ X = 1 ✔ ✔ ✔ ✔ X = 0 • Does not seem “fair” to allow random performance on X = 0 • Perfect classification is impossible
9 Outline • Recap: Disparity impact – Issues with Disparate Impact • Observational measure of fairness – Equal opportunity and Equalized odds – Predictive Value Parity – Tradeoff • Achieving Equalized Odds – Binary Classifier
10 True Positive Parity (TPP) (or equal opportunity) • Assume classifier f and label C are binary 0/1-variables • Definition. Classifier f satisfies true positive parity if for all groups 𝑦 and 𝑦′, 𝑄 01= 𝑔 𝑍 = 1|𝐷 = 1 = 𝑄 01= > 𝑔 𝑍 = 1|𝐷 = 1 • When positive outcome (1) is desirable • Equivalently, primary harm is due to false negatives – Deny bail when person will not recidivate
11 TPP ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ X = 1 ✔ ✔ X = 0 • Forces similar performance on C = 1
12 False Positive Parity (FPP) • Assume classifier f and label C are binary 0/1-variables • Definition. Classifier f satisfies false positive parity if for all groups 𝑦 and 𝑦′, 𝑄 01= 𝑔 𝑍 = 1|𝐷 = 0 = 𝑄 01= > 𝑔 𝑍 = 1|𝐷 = 0 • TPP & FPP: Equalized Odds , or Positive Rate Parity f satisfies equalized odds if f is conditionally independent of X given C .
13 Positive Rate Parity ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ X = 1 ✔ ✔ ✔ X = 0 𝑄 012 𝑔(𝑍) = 1 𝐷 = 1] =? 𝑄 012 𝑔(𝑍) = 1 𝐷 = 0] =? 𝑄 016 𝑔(𝑍) = 1 𝐷 = 1] =? 𝑄 016 𝑔(𝑍) = 1 𝐷 = 0] =?
14 Positive Rate Parity ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ X = 1 ✔ ✔ ✔ X = 0 𝑄 012 𝑔(𝑍) = 1 𝐷 = 1] = 1 𝑄 012 𝑔(𝑍) = 1 𝐷 = 0] = 1/2 𝑄 016 𝑔(𝑍) = 1 𝐷 = 1] = 1 𝑄 016 𝑔(𝑍) = 1 𝐷 = 0] = 1/2
15 Outline • Recap: Disparity impact – Issues with Disparate Impact • Observational measure of fairness – Equal opportunity and Equalized odds – Predictive Value Parity – Tradeoff • Achieving Equalized Odds – Binary Classifier
16 Predictive Value Parity • Assume classifier f and label C are binary 0/1-variables • Definition. Classifier f satisfies – positive predictive value parity if if for all groups 𝑦 and 𝑦′, 𝑄 01= 𝐷 = 1|𝑔 𝑍 = 1 = 𝑄 01= > 𝐷 = 1|𝑔 𝑍 = 1 – negative predictive value parity if if for all groups 𝑦 and 𝑦′, 𝑄 01= 𝐷 = 1|𝑔 𝑍 = 0 = 𝑄 01= > 𝐷 = 1|𝑔 𝑍 = 0 – predictive value parity if satisfies both of the above. • Equalized chance of success given acceptance.
17 Predictive Value Parity ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ X = 1 ✔ ✔ ✔ X = 0 𝑄 012 𝐷 = 1 𝑔(𝑍) = 1] = 𝑄 012 𝐷 = 1 𝑔(𝑍) = 0] = 𝑄 016 𝐷 = 1 𝑔(𝑍) = 1] = 𝑄 016 𝐷 = 1 𝑔(𝑍) = 0] =
18 Predictive Value Parity ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ X = 1 ✔ ✔ ✔ X = 0 𝑄 012 𝐷 = 1 𝑔(𝑍) = 1] = 8/9 𝑄 012 𝐷 = 1 𝑔(𝑍) = 0] = 0 𝑄 016 𝐷 = 1 𝑔(𝑍) = 1] = 1/3 𝑄 016 𝐷 = 1 𝑔(𝑍) = 0] = 0
19 Trade-off • Proposition. Assume differing base rates and an imperfect classifier 𝑔 ≠ 𝐷 . Then either – Positive rate parity fails, or – Predictive value parity fails. • We will look at a similar result later in the course due to Kleinberg, Mullainathan and Raghavan (2016)
20 Intuition • So far, predictor is perfect. • Let's introduce an error.
21 Intuition • But this doesn't satisfy positive rate parity! • Let's fix that!
22 Intuition • Satisfies positive rate parity!
23 Intuition • Does not satisfy predictive value parity!
24
25 Outline • Recap: Disparity impact – Issues with Disparate Impact • Observational measure of fairness – Equal opportunity and Equalized odds – Predictive Value Parity – Tradeoff • Achieving Equalized Odds – Binary Classifier
26 Equalized Odds f satisfies equalized odds if f is conditionally independent of protected X given outcome C. P be any classifier out of the existing training • Let 𝑔 pipeline for the problem at hand that fails to satisfy equalized odds
P that does not satisfy 27 Classifier 𝑔 equalized odds ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ X = 1 ✔ ✔ X = 0 P(𝑍) = 1 𝐷 = 0] ≠ 𝑄 P(𝑍) = 1 𝐷 = 0] 𝑄 012 𝑔 016 𝑔
� � 28 Derived Classifier Q is derived from 𝒈 S and the • A new classifier 𝑔 protected attribute X Q is independent of features Y conditional on ( 𝑔 P ,X ) – 𝑔 Q 𝑍 = 𝑑|𝐷 = 1 is – 𝑄 012 𝑔 P 𝑍 = 𝑑 V , 𝑌 = 1 ⋅ 𝑄 P 𝑍 = 𝑑′|𝐷 = 1 ∑ 𝑄 𝑑|𝑔 012 𝑔 Y > ∈{6,2} Q 𝑍 = 𝑑|𝐷 = 0 is – 𝑄 012 𝑔 P 𝑍 = 𝑑 V , 𝑌 = 1 ⋅ 𝑄 P 𝑍 = 𝑑′|𝐷 = 0 ∑ 𝑄 𝑑|𝑔 012 𝑔 Y > ∈{6,2} Q 𝑍 = 𝑑|𝐷 = 1 – 𝑄 016 𝑔 X=1 c'=0 c’=1 X=0 c’=0 c’=1 Q 𝑍 = 𝑑|𝐷 = 0 c=0 p0 p1 c=0 p2 p3 – 𝑄 016 𝑔 c=1 1-p0 1-p1 c=1 1-p2 1-p3
29 Derived Classifier 1 Q : • Options for 𝑔 1.0 Q(𝑍) = 1 𝐷 = 1] Q = 𝑔 P ( + ) – 𝑔 + Q = 1 − 𝑔 P ( x ) – 𝑔 0.5 Q = (1,1) – 𝑔 x Q = (0,0) – 𝑔 012 𝑔 – Or some randomized 0.0 𝑄 o combination of these 0.0 0.5 1.0 Q(𝑍) = 1 𝐷 = 0] 𝑄 012 𝑔 Q is in the 𝐷 enclosed region
30 Derived Classifier Q is in this region Q(𝑍) = 1 𝐷 = 1] 𝑔 for X = 0 Q is in this region 𝑔 for X = 1 0 𝑔 𝑄 Q(𝑍) = 1 𝐷 = 0] 𝑄 0 𝑔
31 Derived Classifier • Loss minimization: 𝑚: 0,1 _ → 𝑆 Q 𝑍 = 𝑑 when the – Indicate the loss of predicting 𝑔 correct label is 𝑑′′ Q(𝑍), 𝐷 ] s.t. • Minimize the expected loss E [𝑚 𝑔 Q is derived – 𝑔 Q satisfies equalized odds – 𝑔 Q 𝑍 = 1|𝐷 = 1 = 𝑄 Q 𝑍 = 1|𝐷 = 1 • 𝑄 012 𝑔 016 𝑔 Q 𝑍 = 1|𝐷 = 0 = 𝑄 Q 𝑍 = 1|𝐷 = 0 • 𝑄 012 𝑔 016 𝑔
� 32 Derived Classifier Q 𝑍 , 𝐷 Q(𝑍) = 𝑑, 𝐷 = 𝑑 VV ] 𝑚 𝑑, 𝑑 V ′ Pr[𝑔 = ∑ • E 𝑚 𝑔 Y,Y >> ∈{6,2} Q = 𝑑, 𝐷 = 𝑑′′] • Pr[𝑔 Q = 𝑑, 𝐷 = 𝑑′′ 𝑔 Q = 𝑔 P Pr 𝑔 Q = 𝑔 P = Pr 𝑔 Q = 𝑑, 𝐷 = 𝑑′′ 𝑔 Q ≠ 𝑔 P Pr 𝑔 Q ≠ 𝑔 P +Pr 𝑔 P = 𝑑, 𝐷 = 𝑑′′ Pr 𝑔 Q = 𝑔 P = Pr 𝑔 P = 1 − 𝑑, 𝐷 = 𝑑′′ Pr 𝑔 Q ≠ 𝑔 P +Pr 𝑔 P 𝑔 Based on the joint distribution X=1 c'=0 c’=1 X=0 c'=0 c’=1 c=0 p0 p1 c=0 p2 p3 Q 𝑔 c=1 1-p0 1-p1 c=1 1-p2 1-p3
33 Summary: Multiple fairness measures • Demographic parity or disparate impact – Pro: Used in the law – Con: Perfect classification is impossible – Achieved by modifying data • Equal odds/ opportunity – Pro: Perfect classification is possible – Con: Different groups can get different rates of positive prediction – Achieved by post processing the classifier
34 Summary: Multiple fairness measures • Equal odds/opportunity – Different groups may be treated unequally – Maybe due to the problem – Maybe due to bias in the dataset • While demographic parity seems like a good fairness goal for the society, … Equal odds/opportunity seems to be measuring whether an algorithm is fair (independent of other factors like input data).
Recommend
More recommend