is there a trade off between fairness and accuracy a
play

Is There a Trade-Off Between Fairness and Accuracy? A Perspective - PowerPoint PPT Presentation

Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing Sanghamitra Dutta Hazar Yueksel Dennis Wei sanghamd@andrew.cmu.edu hazar.yueksel@ibm.com dwei@us.ibm.com Kush Varshney Pin-Yu Chen Sijia


  1. Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing Sanghamitra Dutta Hazar Yueksel Dennis Wei sanghamd@andrew.cmu.edu hazar.yueksel@ibm.com dwei@us.ibm.com Kush Varshney Pin-Yu Chen Sijia Liu krvarshn@us.ibm.com pin-yu.chen@ibm.com sijia.liu@ibm.com 1

  2. Motivational Example Noisy Mapping (π‘Œ, 𝑍) (π‘Œ ! , 𝑍 ! ) π‘Œ: Exam Score π‘Œ ! : True Ability 𝑍: Data Label (0) or (1) 𝑍 ! : True Label π‘Ž: Protected Attribute (Gender, Race, etc.) Construct Space Observed Space No trade-off between accuracy Accuracy-fairness trade-off in observed space and fairness is due to noisier mappings for one group Bayes optimal classifier achieves making the 0 and 1 labels β€œless separable” fairness (Equal Opportunity) Setup inspired from [Friedler et al. ’16] [Yeom et al. ’18]; Definition of Equal Opportunity [Hardt et al. β€˜16] 2

  3. Main Contributions Alleviate Trade-off in Real World Concept of Separability Ideal Distributions Gather knowledge from active data Chernoff Information: approximation to where accuracy and fairness are in collection, often improving separability best error exponent in binary classification accord Explain the trade-off (Theorem 1) β€’ Proof of existence (Theorem 2) β€’ Criterion to alleviate (Theorem 3) β€’ With analytical forms Compute fundamental limits β€’ Interpretation β€’ Compute alleviated trade-off β€’ Trade-off after Data Collection 1.4 1.2 Accuracy 1.2 Accuracy Trade-off on Existing Data 1 1 Trade-off on Existing Data 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 0 Discrimination 0 0.2 0.4 0.6 0.8 1 Discrimination Accuracy with respect to observed dataset is Plausible distributions in observed space, These results also explain why a problematic measure of performance or distributions in the construct space active fairness works 3

  4. Related Works β€’ Characterizing Accuracy-Fairness Trade-Off Exponent Analysis with [Menon & Williamson β€˜18] [Garg et al. β€˜19] Geometric Interpretability [Chen et al. β€˜18] [Zhao & Gordon β€˜19] β€’ Empirical Datasets for Accuracy Evaluation [Wick et al. ’19] [Sharma et al. β€˜19] β€’ Pre-processing Datasets for Fairness [Calmon et al. β€˜18] [Feldman et al. β€˜15] [Zemel et al. β€˜13] β€’ Explainability/ Active Fairness [Varshney et al. β€˜18] [Noriega-Campero et al. β€˜19] 4

  5. Preliminaries For group Z=1 , For group Z=0 , Noisy Mapping π‘Œ| +-),,-* ∼ 𝑅 ) 𝑦 π‘Œ| +-),,-) ∼ 𝑄 ) 𝑦 𝑍 = 𝑍 π‘Œ| +-*,,-* ∼ 𝑅 * (𝑦) π‘Œ| +-*,,-) ∼ 𝑄 * (𝑦) (π‘Œ, 𝑍) ! (π‘Œ ! , 𝑍 ! ) π‘Œ = 𝑔 +,, π‘Œ ! π‘ˆ ! 𝑦 = log 𝑄 " (𝑦) " 𝑦 = log 𝑅 " (𝑦) 𝑄 ! (𝑦) β‰₯ 𝜐 ! π‘ˆ 𝑅 ! (𝑦) β‰₯ 𝜐 " EQUAL OPPORTUNITY Γ  EQUAL Prob. of FN Construct Space Observed Space β€’ Probability of F alse N egative(FN): 𝑄 "#,% ! 𝜐 & = Pr(π‘ˆ & 𝑦 < 𝜐 & |𝑍 = 1, π‘Ž = 𝑨) Wrongful Reject of True ( + ), i.e., True Y=1 β€’ Probability of F alse P ositive(FP): 𝑄 "',% ! 𝜐 & = Pr(π‘ˆ & 𝑦 β‰₯ 𝜐 & |𝑍 = 0, π‘Ž = 𝑨) Wrongful Accept of True ( βˆ’ ), i.e., True Y=0 β€’ Probability of error: 𝑄 (,% 𝜐 = 𝜌 ) 𝑄 "',% 𝜐 + 𝜌 * 𝑄 "#,% 𝜐 Prior probabilities (assume 𝜌 ) = 𝜌 * = 1/2 ) 5

  6. Quick Background on Chernoff Error Exponents ! 𝜐 % ≲ 𝑓 &' "#,%! () ! ) 𝑄 !",$ Chernoff exponents of probabilities of FN and FP (Larger exponent Γ  lower error) ! 𝜐 % ≲ 𝑓 &' "&,%! () ! ) 𝑄 !+,$ 1 1 (Larger exponent Since 𝑄 .,0 𝜐 = 2 𝑄 34,0 𝜐 + 2 𝑄 35,0 𝜐 , we define Γ  lower error the Chernoff exponent of overall error probability as Γ  higher accuracy) 𝐹 .,0 ! 𝜐 6 = min{𝐹 35,0 ! 𝜐 6 , 𝐹 34,0 ! (𝜐 6 )} Lemma: Chernoff exponent of error probability for Bayes optimal classifier between distributions 𝑄 7 (𝑦) under 𝑍 = 0 and 𝑄 1 (𝑦) under 𝑍 = 1 : 8∈[7,1] βˆ‘π‘„ 7 𝑦 8 𝑄 1 𝑦 1<8 Chernoff information 𝐷 𝑄 7 , 𝑄 1 = βˆ’ log min [Cover & Thomas] 6

  7. Our Proposition: Concept of Separability β€’ Definition of Separability: For a group of people with data distributions 𝑄 7 (𝑦) and 𝑄 1 (𝑦) under hypotheses 𝑍 = 0 and 𝑍 = 1 , we define the separability as their Chernoff information 𝐷 𝑄 7 , 𝑄 1 . Geometric interpretability makes them tractable 7

  8. Geometric understanding of the results Ξ› 1 ( u ) Ξ› 0 ( u ) For group Z=0 , log-generating function 0.4 2 𝑄 ) 𝑦 ~ 𝑂 (1,1) 𝑄 ! 𝑦 𝑄 " 𝑦 0.2 𝑄 * 𝑦 ~𝑂(4,1) 1 π‘ˆ ) 𝑦 β‰₯ 𝜐 ) 0 𝜈 = 4 𝜈 = 1 -5 0 5 0 Tangent with E FN E FP ! ) 𝑍 = 0, π‘Ž = 0 = 9 Ξ› ! 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓 '& 2 𝑣(𝑣 βˆ’ 1) -1 ! ) 𝑍 = 1, π‘Ž = 0 = 9 slope 𝝊 𝟏 Ξ› " 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓 '& 2 𝑣(𝑣 + 1) -2 u -1.5 -1 -0.5 0 0.5 1 1.5 𝐹 #$,& ! 𝜐 ! = sup (π‘£πœ ! βˆ’ Ξ› ! 𝑣 ) '(! 𝐹 #*,& ! 𝜐 ! = sup (π‘£πœ ! βˆ’ Ξ› " 𝑣 ) '+! 𝐹 ,,& ! 𝜐 ! = min{𝐹 #*,& ! 𝜐 ! , 𝐹 #$,& ! (𝜐 ! )} 8

  9. Geometric understanding of the results Ξ› 1 ( u ) Ξ› 0 ( u ) For group Z=0 , log-generating function 0.4 2 𝑄 ) 𝑦 ~ 𝑂 (1,1) 𝑄 ! 𝑦 𝑄 " 𝑦 0.2 𝑄 * 𝑦 ~𝑂(4,1) 1 π‘ˆ ) 𝑦 β‰₯ 𝜐 ) 0 𝜈 = 4 𝜈 = 1 -5 0 5 0 E FN E FP ! ) 𝑍 = 0, π‘Ž = 0 = 9 Ξ› ! 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓 '& 2 𝑣(𝑣 βˆ’ 1) -1 ! ) 𝑍 = 1, π‘Ž = 0 = 9 Ξ› " 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓 '& 2 𝑣(𝑣 + 1) -2 u -1.5 -1 -0.5 0 0.5 1 1.5 𝐹 #$,& ! 𝜐 ! = sup (π‘£πœ ! βˆ’ Ξ› ! 𝑣 ) '(! 𝐹 #*,& ! 𝜐 ! = sup (π‘£πœ ! βˆ’ Ξ› " 𝑣 ) '+! 𝐹 ,,& ! 𝜐 ! = min{𝐹 #*,& ! 𝜐 ! , 𝐹 #$,& ! (𝜐 ! )} 9

  10. Geometric understanding of the results Ξ› 1 ( u ) Ξ› 0 ( u ) For group Z=0 , log-generating function 0.4 2 𝑄 ) 𝑦 ~ 𝑂 (1,1) 𝑄 ! 𝑦 𝑄 " 𝑦 0.2 𝑄 * 𝑦 ~𝑂(4,1) 1 π‘ˆ ) 𝑦 β‰₯ 𝜐 ) 0 𝜈 = 4 𝜈 = 1 -5 0 5 0 E FN E FP ! ) 𝑍 = 0, π‘Ž = 0 = 9 Ξ› ! 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓 '& 2 𝑣(𝑣 βˆ’ 1) C ( P 0 , P 1 ) -1 ! ) 𝑍 = 1, π‘Ž = 0 = 9 Ξ› " 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓 '& 2 𝑣(𝑣 + 1) -2 u -1.5 -1 -0.5 0 0.5 1 1.5 𝐹 #$,& ! 𝜐 ! = sup (π‘£πœ ! βˆ’ Ξ› ! 𝑣 ) '(! 𝐹 #*,& ! 𝜐 ! = sup (π‘£πœ ! βˆ’ Ξ› " 𝑣 ) 𝐹 35 = 𝐹 34 = 𝐷(𝑄 7 , 𝑄 1 ) '+! 𝐹 ,,& ! 𝜐 ! = min{𝐹 #*,& ! 𝜐 ! , 𝐹 #$,& ! (𝜐 ! )} 10

  11. Geometric understanding of the results Ξ› 1 ( u ) Ξ› 0 ( u ) For group Z=0 , log-generating function 0.4 2 𝑄 ) 𝑦 ~ 𝑂 (1,1) 𝑄 ! 𝑦 𝑄 " 𝑦 0.2 𝑄 * 𝑦 ~𝑂(4,1) 1 π‘ˆ ) 𝑦 β‰₯ 𝜐 ) 0 𝜈 = 4 𝜈 = 1 -5 0 5 0 E FN E FP ! ) 𝑍 = 0, π‘Ž = 0 = 9 Ξ› ! 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓 '& 2 𝑣(𝑣 βˆ’ 1) C ( P 0 , P 1 ) -1 ! ) 𝑍 = 1, π‘Ž = 0 = 9 Ξ› " 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓 '& 2 𝑣(𝑣 + 1) -2 u -1.5 -1 -0.5 0 0.5 1 1.5 𝐹 #$,& ! 𝜐 ! = sup (π‘£πœ ! βˆ’ Ξ› ! 𝑣 ) '(! 𝐹 #*,& ! 𝜐 ! = sup (π‘£πœ ! βˆ’ Ξ› " 𝑣 ) '+! 𝐹 ,,& ! 𝜐 ! = min{𝐹 #*,& ! 𝜐 ! , 𝐹 #$,& ! (𝜐 ! )} 11

  12. Accuracy-fairness trade-off is due to difference in separability of one group of people over another Theorem 1 (informal): One of the following is true in observed space: β€’ Unbiased Mappings 𝐷 𝑄 7 , 𝑄 1 = 𝐷 𝑅 7 , 𝑅 1 : Bayes optimal classifiers for both groups also satisfy equal opportunity, i.e., 𝐹 35,0 " 𝜐 7 = 𝐹 35,0 # 𝜐 1 . β€’ Biased Mappings 𝐷 𝑄 7 , 𝑄 1 < 𝐷 𝑅 7 , 𝑅 1 : Given two classifiers (one for each group) that satisfy equal opportunity, for at least one of the groups it is not the Bayes optimal classifier, i.e., Either 𝐹 .,0 " 𝜐 7 < 𝐷(𝑄 7 , 𝑄 1 ) or 𝐹 .,0 # 𝜐 1 < 𝐷(𝑅 7 , 𝑅 1 ) or both 12

  13. Geometric understanding of the results For group Z=0 , 4 0.4 𝑄 ) 𝑦 ~ 𝑂 (1,1) 𝑄 ! 𝑦 𝑄 " 𝑦 0.2 𝑄 * 𝑦 ~𝑂(4,1) 2 π‘ˆ ) 𝑦 β‰₯ 𝜐 ) 0 𝜈 = 4 𝜈 = 1 -5 0 5 0 C ( P 0 , P 1 ) For group Z=1 , C ( Q 0 , Q 1 ) 0.4 -2 𝑅 ) 𝑦 ~ 𝑂 (0,1) 𝑅 ! 𝑦 𝑅 " 𝑦 𝑅 𝑦 ~𝑂(4,1) 0.2 -4 π‘ˆ * 𝑦 β‰₯ 𝜐 * -1.5 -1 -0.5 0 0.5 1 1.5 0 𝜈 = 0 𝜈 = 4 -5 0 5 For group Z=0, we have 𝐹 #* = 𝐹 #$ = 𝐷(𝑄 ! , 𝑄 " ) Bayes optimal classifiers do not satisfy Equal Opportunity (unequal 𝐹 67 ) For group Z=1, we have 𝐹 #* = 𝐹 #$ = 𝐷(𝑅 ! , 𝑅 " ) 13

  14. Geometric understanding of the results 4 For group Z=0 , 0.4 𝑄 ) 𝑦 ~ 𝑂 (1,1) 𝑄 ! 𝑦 𝑄 " 𝑦 0.2 𝑄 * 𝑦 ~𝑂(4,1) 2 π‘ˆ ) 𝑦 β‰₯ 𝜐 ) 0 𝜈 = 4 𝜈 = 1 -5 0 5 0 For group Z=1 , 0.4 𝑅 ) 𝑦 ~ 𝑂 (0,1) 𝑅 ! 𝑦 -2 𝑅 " 𝑦 𝑅 𝑦 ~𝑂(4,1) 0.2 -4 π‘ˆ * 𝑦 β‰₯ 𝜐 * 0 𝜈 = 0 𝜈 = 4 -1 0 1 -5 0 5 𝐹 67,% " 𝜐 ) = 𝐹 67,% # (𝜐 * ) Equal Opportunity (equal 𝐹 67 ) satisfied but sub-optimal for privileged group Z=1 Avoid active harm to privileged group? 14

Recommend


More recommend