Fairness-Aware Learning for Continuous Attributes and Treatments Jérémie Mary, Criteo AI Lab Clément Calauzènes, Criteo AI Lab Noureddine El Karoui, Criteo AI Lab and UC, Berkeley ICML 2019, Long Beach, CA
generalizes to Y generalizes to Y Fairness and independence Z ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and We propose new metrics that also easily generalize to continuous variables. even when Z non binary Z Demographic Parity even when Z non binary Z Y EO Generalizations using independence notions disparate impact, demographic parity Y 2 / 8 Z Y of variable Y (e.g. payment default) based on available information X (credit card history); prediction may be biased/unfair wrt sensitive attribute Z (gender). Most fairness work restricted to binary values of Y and Z . DEO Y Y Z Y Z Y Equal Opportunity DI Y Setup build prediction ˆ
generalizes to Y generalizes to Y Fairness and independence EO ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and We propose new metrics that also easily generalize to continuous variables. even when Z non binary Z Demographic Parity even when Z non binary Z Y Generalizations using independence notions wrt sensitive attribute Z (gender). Most fairness work restricted to binary values of Y and Z . available information X (credit card history); prediction may be biased/unfair Y of variable Y (e.g. payment default) based on 2 / 8 Setup build prediction ˆ DEO = P (ˆ Y =1 | Z =1 , Y =1) − P (ˆ Y =1 | Z =0 , Y =1) Equal Opportunity DI = P (ˆ Y =1 | Z =0) , disparate impact, demographic parity P (ˆ Y =1 | Z =1)
Fairness and independence Z ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and We propose new metrics that also easily generalize to continuous variables. generalizes to Demographic Parity generalizes to EO Generalizations using independence notions disparate impact, demographic parity Z Y 2 / 8 Y Y DEO Most fairness work restricted to binary values of Y and Z . wrt sensitive attribute Z (gender). available information X (credit card history); prediction may be biased/unfair Y of variable Y (e.g. payment default) based on DI Z Y Y Z Y Equal Opportunity Setup build prediction ˆ → ˆ − − − − − − − − Y ⊥ ⊥ Z | Y , even when Z non binary , → ˆ − − − − − − − − Y ⊥ ⊥ Z , even when Z non binary .
HGR: measuring independence Defjnition (Hirschfeld-Gebelein-Rényi Maximum Correlation Coeffjcient) ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Connection exploited in RDC, [8] with CCA in RKHS If f g only linear functions, get CCA. iff V and U independent. ; HGR U V HGR U V 3 / 8 (1) Given two random variables U ∈ U and V ∈ V , hgr ( U , V ) ≜ sup ρ ( f ( U ) , g ( V )) f , g [ ] [ ] f 2 ( U ) g 2 ( V ) < ∞ . ρ :Pearson’s correlation; f , g such that E , E
HGR: measuring independence Defjnition (Hirschfeld-Gebelein-Rényi Maximum Correlation Coeffjcient) ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Connection exploited in RDC, [8] with CCA in RKHS 3 / 8 (1) Given two random variables U ∈ U and V ∈ V , hgr ( U , V ) ≜ sup ρ ( f ( U ) , g ( V )) f , g [ ] [ ] f 2 ( U ) g 2 ( V ) < ∞ . ρ :Pearson’s correlation; f , g such that E , E 0 ≤ HGR ( U , V ) ≤ 1 ; HGR ( U , V ) = 0 iff V and U independent. If f , g only linear functions, get CCA.
Information theory and relaxation Theorem (Witsenhausen’75) ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Extends naturally to continuous variables (replace sums by integrals) 4 / 8 Suppose U and V discrete and let matrix π ( u , v ) Q ( u , v ) = √ √ , then hgr ( U , V ) = σ 2 ( Q ) . π U ( u ) π V ( v ) π ( u , v ) joint distribution of ( U , V ) ; π U and π V marginals. σ 2 : 2nd largest singular value. Upper bound on HGR by χ 2 -divergence
Fairness aware learning; Equalized Odds (EO) argmin ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Related work : [2], [5],[9], [4], [1], [3], [6], [11], [7, 10] 5 / 8 argmin Given expected loss L , function class H and fairness tolerance ε > 0 , solve : HGR | ∞ ≜ || HGR (ˆ L ( h , X , Y ) subject to Y | Y = y , Z | Y = y ) || ∞ ≤ ε h ∈H Practicals: Relax constraint HGR | ∞ ≤ ε to get tractable penalty : If � � � χ 2 (ˆ � χ 2 | 1 = π (ˆ y | y , z | y ) , ˆ π (ˆ y | y ) ⊗ ˆ π ( z | y )) 1 , this yields L ( h , X , Y ) + λχ 2 | 1 h ∈H
Fairness aware learning; Equalized Odds (EO) argmin ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Related work : [2], [5],[9], [4], [1], [3], [6], [11], [7, 10] 5 / 8 argmin Given expected loss L , function class H and fairness tolerance ε > 0 , solve : HGR | ∞ ≜ || HGR (ˆ L ( h , X , Y ) subject to Y | Y = y , Z | Y = y ) || ∞ ≤ ε h ∈H Practicals: Relax constraint HGR | ∞ ≤ ε to get tractable penalty : If � � � χ 2 (ˆ � χ 2 | 1 = π (ˆ y | y , z | y ) , ˆ π (ˆ y | y ) ⊗ ˆ π ( z | y )) 1 , this yields L ( h , X , Y ) + λχ 2 | 1 h ∈H
Y and Z binary valued: comparison with previous work Drug ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and NN FERM Test case SVM ACC DEO ACC DEO ACC DEO ACC DEO ACC DEO Method 6 / 8 German Adult COMPAS Arrhythmia Smaller datasets difficult for our proposal. NN effect. Results comparable to state of the art use our proposal with neural network to train a classifier such that a Goal: maintain good accuracy while having a smaller DEO. Y . Reproduce and compare experiments from Donini et al. '18 [3]. binary sensitive Z does not unfairly influence an outcome � Naïve SVM 75 ± 4 11 ± 3 72 ± 1 14 ± 2 80 9 74 ± 5 12 ± 5 81 ± 2 22 ± 4 71 ± 5 10 ± 3 73 ± 1 11 ± 2 79 8 74 ± 3 10 ± 6 81 ± 2 22 ± 3 75 ± 5 5 ± 2 96 ± 1 9 ± 2 77 1 73 ± 4 5 ± 3 79 ± 3 10 ± 5 74 ± 7 19 ± 14 97 ± 0 1 ± 0 84 14 74 ± 4 47 ± 19 79 ± 3 15 ± 16 NN + χ 2 75 ± 6 15 ± 9 96 ± 0 0 ± 0 83 3 73 ± 3 25 ± 14 78 ± 5 0 ± 0
DNN 0.7 LR Y | Z , Y Y | Z , Y DNN + L LR + L 2 2 DNN + KL| 1 LR + KL| 1 0.6 Fairness (HGR ∞ ) Fairness (HGR ∞ ) 0.5 DNN + χ 2 | 1 LR + χ 2 | 1 0.5 0.4 0.4 contrast with baseline L Y Z Y penalty which suffers from mini-batching 0.02 0.03 0.1 1.0 Predictive Error (MSE) Predictive Error (MSE) and L Y Z Y some ̂ Figure: Equalized odds with DNN Fairness-Aware Learning for Continuous Attributes and Treatments ICML '19 Continuous Case: Criminality Rates points out of graph to the right. Regression: for KL Figure: Equalized odds with Linear Dataset : UCI Communities+and+Crime . 2 sets of experiments, 3 fairness ̂ work smoothly with mini-batched stochastic optimization; and KL Important that fairness penalty be compatible with DNNs DNN improves fairness at lower price than linear models in terms of MSE. We find : Linear regression (LR), full batches of data penalties : 7 / 8 Deep neural nets (DNN) with mini-batches ( n = 200 ; Adam as optimizer) Regularization parameter λ varies 2 − 4 to 2 6
DNN 0.7 LR Y | Z , Y Y | Z , Y DNN + L LR + L 2 2 DNN + KL| 1 LR + KL| 1 0.6 Fairness (HGR ∞ ) Fairness (HGR ∞ ) 0.5 DNN + χ 2 | 1 LR + χ 2 | 1 0.5 0.4 0.4 0.02 0.03 0.1 1.0 Predictive Error (MSE) Predictive Error (MSE) and L Y Z Y some ̂ Figure: Equalized odds with DNN points out of graph to the right. Fairness-Aware Learning for Continuous Attributes and Treatments ICML '19 Continuous Case: Criminality Rates Regression: for KL DNN improves fairness at lower price than linear models in terms of MSE. penalties : Linear regression (LR), full batches of data Deep neural nets (DNN) with mini-batches ( n ; Adam as optimizer) Regularization parameter varies to We find : Important that fairness penalty be compatible with DNNs Figure: Equalized odds with Linear contrast with baseline L penalty which suffers from mini-batching Dataset : UCI Communities+and+Crime . 2 sets of experiments, 3 fairness ̂ 7 / 8 χ 2 | 1 and KL | 1 work smoothly with mini-batched stochastic optimization; ˆ Y | Z , Y 2
contrast with baseline L Y Z Y penalty which suffers from mini-batching Continuous Case: Criminality Rates ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Figure: Equalized odds with DNN ̂ points out of graph to the right. some Figure: Equalized odds with Linear Dataset : UCI Communities+and+Crime . 2 sets of experiments, 3 fairness ̂ 7 / 8 work smoothly with mini-batched stochastic optimization; Regularization parameter varies We find : DNN improves fairness at lower price than linear models in terms of MSE. Important that fairness penalty be compatible with DNNs and KL to Deep neural nets (DNN) with mini-batches ( n ; Adam as optimizer) Linear regression (LR), full batches of data penalties : 0.7 DNN LR Y | Z , Y Y | Z , Y DNN + L LR + L 2 2 DNN + KL| 1 LR + KL| 1 0.6 Fairness (HGR ∞ ) Fairness (HGR ∞ ) 0.5 DNN + χ 2 | 1 LR + χ 2 | 1 0.5 0.4 0.4 0.02 0.03 0.1 1.0 Predictive Error (MSE) Predictive Error (MSE) ˆ Y | Z , Y Regression: for KL | 1 and L 2
Recommend
More recommend