Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing Sanghamitra Dutta Hazar Yueksel Dennis Wei sanghamd@andrew.cmu.edu hazar.yueksel@ibm.com dwei@us.ibm.com Kush Varshney Pin-Yu Chen Sijia Liu krvarshn@us.ibm.com pin-yu.chen@ibm.com sijia.liu@ibm.com 1
Motivational Example Noisy Mapping (π, π) (π ! , π ! ) π: Exam Score π ! : True Ability π: Data Label (0) or (1) π ! : True Label π: Protected Attribute (Gender, Race, etc.) Construct Space Observed Space No trade-off between accuracy Accuracy-fairness trade-off in observed space and fairness is due to noisier mappings for one group Bayes optimal classifier achieves making the 0 and 1 labels βless separableβ fairness (Equal Opportunity) Setup inspired from [Friedler et al. β16] [Yeom et al. β18]; Definition of Equal Opportunity [Hardt et al. β16] 2
Main Contributions Alleviate Trade-off in Real World Concept of Separability Ideal Distributions Gather knowledge from active data Chernoff Information: approximation to where accuracy and fairness are in collection, often improving separability best error exponent in binary classification accord Explain the trade-off (Theorem 1) β’ Proof of existence (Theorem 2) β’ Criterion to alleviate (Theorem 3) β’ With analytical forms Compute fundamental limits β’ Interpretation β’ Compute alleviated trade-off β’ Trade-off after Data Collection 1.4 1.2 Accuracy 1.2 Accuracy Trade-off on Existing Data 1 1 Trade-off on Existing Data 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.2 0.4 0.6 0.8 1 0 Discrimination 0 0.2 0.4 0.6 0.8 1 Discrimination Accuracy with respect to observed dataset is Plausible distributions in observed space, These results also explain why a problematic measure of performance or distributions in the construct space active fairness works 3
Related Works β’ Characterizing Accuracy-Fairness Trade-Off Exponent Analysis with [Menon & Williamson β18] [Garg et al. β19] Geometric Interpretability [Chen et al. β18] [Zhao & Gordon β19] β’ Empirical Datasets for Accuracy Evaluation [Wick et al. β19] [Sharma et al. β19] β’ Pre-processing Datasets for Fairness [Calmon et al. β18] [Feldman et al. β15] [Zemel et al. β13] β’ Explainability/ Active Fairness [Varshney et al. β18] [Noriega-Campero et al. β19] 4
Preliminaries For group Z=1 , For group Z=0 , Noisy Mapping π| +-),,-* βΌ π ) π¦ π| +-),,-) βΌ π ) π¦ π = π π| +-*,,-* βΌ π * (π¦) π| +-*,,-) βΌ π * (π¦) (π, π) ! (π ! , π ! ) π = π +,, π ! π ! π¦ = log π " (π¦) " π¦ = log π " (π¦) π ! (π¦) β₯ π ! π π ! (π¦) β₯ π " EQUAL OPPORTUNITY Γ EQUAL Prob. of FN Construct Space Observed Space β’ Probability of F alse N egative(FN): π "#,% ! π & = Pr(π & π¦ < π & |π = 1, π = π¨) Wrongful Reject of True ( + ), i.e., True Y=1 β’ Probability of F alse P ositive(FP): π "',% ! π & = Pr(π & π¦ β₯ π & |π = 0, π = π¨) Wrongful Accept of True ( β ), i.e., True Y=0 β’ Probability of error: π (,% π = π ) π "',% π + π * π "#,% π Prior probabilities (assume π ) = π * = 1/2 ) 5
Quick Background on Chernoff Error Exponents ! π % β² π &' "#,%! () ! ) π !",$ Chernoff exponents of probabilities of FN and FP (Larger exponent Γ lower error) ! π % β² π &' "&,%! () ! ) π !+,$ 1 1 (Larger exponent Since π .,0 π = 2 π 34,0 π + 2 π 35,0 π , we define Γ lower error the Chernoff exponent of overall error probability as Γ higher accuracy) πΉ .,0 ! π 6 = min{πΉ 35,0 ! π 6 , πΉ 34,0 ! (π 6 )} Lemma: Chernoff exponent of error probability for Bayes optimal classifier between distributions π 7 (π¦) under π = 0 and π 1 (π¦) under π = 1 : 8β[7,1] βπ 7 π¦ 8 π 1 π¦ 1<8 Chernoff information π· π 7 , π 1 = β log min [Cover & Thomas] 6
Our Proposition: Concept of Separability β’ Definition of Separability: For a group of people with data distributions π 7 (π¦) and π 1 (π¦) under hypotheses π = 0 and π = 1 , we define the separability as their Chernoff information π· π 7 , π 1 . Geometric interpretability makes them tractable 7
Geometric understanding of the results Ξ 1 ( u ) Ξ 0 ( u ) For group Z=0 , log-generating function 0.4 2 π ) π¦ ~ π (1,1) π ! π¦ π " π¦ 0.2 π * π¦ ~π(4,1) 1 π ) π¦ β₯ π ) 0 π = 4 π = 1 -5 0 5 0 Tangent with E FN E FP ! ) π = 0, π = 0 = 9 Ξ ! π£ = π¦π©π‘ π π '& 2 π£(π£ β 1) -1 ! ) π = 1, π = 0 = 9 slope π π Ξ " π£ = π¦π©π‘ π π '& 2 π£(π£ + 1) -2 u -1.5 -1 -0.5 0 0.5 1 1.5 πΉ #$,& ! π ! = sup (π£π ! β Ξ ! π£ ) '(! πΉ #*,& ! π ! = sup (π£π ! β Ξ " π£ ) '+! πΉ ,,& ! π ! = min{πΉ #*,& ! π ! , πΉ #$,& ! (π ! )} 8
Geometric understanding of the results Ξ 1 ( u ) Ξ 0 ( u ) For group Z=0 , log-generating function 0.4 2 π ) π¦ ~ π (1,1) π ! π¦ π " π¦ 0.2 π * π¦ ~π(4,1) 1 π ) π¦ β₯ π ) 0 π = 4 π = 1 -5 0 5 0 E FN E FP ! ) π = 0, π = 0 = 9 Ξ ! π£ = π¦π©π‘ π π '& 2 π£(π£ β 1) -1 ! ) π = 1, π = 0 = 9 Ξ " π£ = π¦π©π‘ π π '& 2 π£(π£ + 1) -2 u -1.5 -1 -0.5 0 0.5 1 1.5 πΉ #$,& ! π ! = sup (π£π ! β Ξ ! π£ ) '(! πΉ #*,& ! π ! = sup (π£π ! β Ξ " π£ ) '+! πΉ ,,& ! π ! = min{πΉ #*,& ! π ! , πΉ #$,& ! (π ! )} 9
Geometric understanding of the results Ξ 1 ( u ) Ξ 0 ( u ) For group Z=0 , log-generating function 0.4 2 π ) π¦ ~ π (1,1) π ! π¦ π " π¦ 0.2 π * π¦ ~π(4,1) 1 π ) π¦ β₯ π ) 0 π = 4 π = 1 -5 0 5 0 E FN E FP ! ) π = 0, π = 0 = 9 Ξ ! π£ = π¦π©π‘ π π '& 2 π£(π£ β 1) C ( P 0 , P 1 ) -1 ! ) π = 1, π = 0 = 9 Ξ " π£ = π¦π©π‘ π π '& 2 π£(π£ + 1) -2 u -1.5 -1 -0.5 0 0.5 1 1.5 πΉ #$,& ! π ! = sup (π£π ! β Ξ ! π£ ) '(! πΉ #*,& ! π ! = sup (π£π ! β Ξ " π£ ) πΉ 35 = πΉ 34 = π·(π 7 , π 1 ) '+! πΉ ,,& ! π ! = min{πΉ #*,& ! π ! , πΉ #$,& ! (π ! )} 10
Geometric understanding of the results Ξ 1 ( u ) Ξ 0 ( u ) For group Z=0 , log-generating function 0.4 2 π ) π¦ ~ π (1,1) π ! π¦ π " π¦ 0.2 π * π¦ ~π(4,1) 1 π ) π¦ β₯ π ) 0 π = 4 π = 1 -5 0 5 0 E FN E FP ! ) π = 0, π = 0 = 9 Ξ ! π£ = π¦π©π‘ π π '& 2 π£(π£ β 1) C ( P 0 , P 1 ) -1 ! ) π = 1, π = 0 = 9 Ξ " π£ = π¦π©π‘ π π '& 2 π£(π£ + 1) -2 u -1.5 -1 -0.5 0 0.5 1 1.5 πΉ #$,& ! π ! = sup (π£π ! β Ξ ! π£ ) '(! πΉ #*,& ! π ! = sup (π£π ! β Ξ " π£ ) '+! πΉ ,,& ! π ! = min{πΉ #*,& ! π ! , πΉ #$,& ! (π ! )} 11
Accuracy-fairness trade-off is due to difference in separability of one group of people over another Theorem 1 (informal): One of the following is true in observed space: β’ Unbiased Mappings π· π 7 , π 1 = π· π 7 , π 1 : Bayes optimal classifiers for both groups also satisfy equal opportunity, i.e., πΉ 35,0 " π 7 = πΉ 35,0 # π 1 . β’ Biased Mappings π· π 7 , π 1 < π· π 7 , π 1 : Given two classifiers (one for each group) that satisfy equal opportunity, for at least one of the groups it is not the Bayes optimal classifier, i.e., Either πΉ .,0 " π 7 < π·(π 7 , π 1 ) or πΉ .,0 # π 1 < π·(π 7 , π 1 ) or both 12
Geometric understanding of the results For group Z=0 , 4 0.4 π ) π¦ ~ π (1,1) π ! π¦ π " π¦ 0.2 π * π¦ ~π(4,1) 2 π ) π¦ β₯ π ) 0 π = 4 π = 1 -5 0 5 0 C ( P 0 , P 1 ) For group Z=1 , C ( Q 0 , Q 1 ) 0.4 -2 π ) π¦ ~ π (0,1) π ! π¦ π " π¦ π π¦ ~π(4,1) 0.2 -4 π * π¦ β₯ π * -1.5 -1 -0.5 0 0.5 1 1.5 0 π = 0 π = 4 -5 0 5 For group Z=0, we have πΉ #* = πΉ #$ = π·(π ! , π " ) Bayes optimal classifiers do not satisfy Equal Opportunity (unequal πΉ 67 ) For group Z=1, we have πΉ #* = πΉ #$ = π·(π ! , π " ) 13
Geometric understanding of the results 4 For group Z=0 , 0.4 π ) π¦ ~ π (1,1) π ! π¦ π " π¦ 0.2 π * π¦ ~π(4,1) 2 π ) π¦ β₯ π ) 0 π = 4 π = 1 -5 0 5 0 For group Z=1 , 0.4 π ) π¦ ~ π (0,1) π ! π¦ -2 π " π¦ π π¦ ~π(4,1) 0.2 -4 π * π¦ β₯ π * 0 π = 0 π = 4 -1 0 1 -5 0 5 πΉ 67,% " π ) = πΉ 67,% # (π * ) Equal Opportunity (equal πΉ 67 ) satisfied but sub-optimal for privileged group Z=1 Avoid active harm to privileged group? 14
Recommend
More recommend