Fairness-Aware Learning for Continuous Attributes and Treatments - PowerPoint PPT Presentation

Fairness-Aware Learning for Continuous Attributes and Treatments Jérémie Mary, Criteo AI Lab Clément Calauzènes, Criteo AI Lab Noureddine El Karoui, Criteo AI Lab and UC, Berkeley ICML 2019, Long Beach, CA

generalizes to Y generalizes to Y Fairness and independence Z ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and We propose new metrics that also easily generalize to continuous variables. even when Z non binary Z Demographic Parity even when Z non binary Z Y EO Generalizations using independence notions disparate impact, demographic parity Y 2 / 8 Z Y of variable Y (e.g. payment default) based on available information X (credit card history); prediction may be biased/unfair wrt sensitive attribute Z (gender). Most fairness work restricted to binary values of Y and Z . DEO Y Y Z Y Z Y Equal Opportunity DI Y Setup build prediction ˆ

generalizes to Y generalizes to Y Fairness and independence EO ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and We propose new metrics that also easily generalize to continuous variables. even when Z non binary Z Demographic Parity even when Z non binary Z Y Generalizations using independence notions wrt sensitive attribute Z (gender). Most fairness work restricted to binary values of Y and Z . available information X (credit card history); prediction may be biased/unfair Y of variable Y (e.g. payment default) based on 2 / 8 Setup build prediction ˆ DEO = P (ˆ Y =1 | Z =1 , Y =1) − P (ˆ Y =1 | Z =0 , Y =1) Equal Opportunity DI = P (ˆ Y =1 | Z =0) , disparate impact, demographic parity P (ˆ Y =1 | Z =1)

Fairness and independence Z ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and We propose new metrics that also easily generalize to continuous variables. generalizes to Demographic Parity generalizes to EO Generalizations using independence notions disparate impact, demographic parity Z Y 2 / 8 Y Y DEO Most fairness work restricted to binary values of Y and Z . wrt sensitive attribute Z (gender). available information X (credit card history); prediction may be biased/unfair Y of variable Y (e.g. payment default) based on DI Z Y Y Z Y Equal Opportunity Setup build prediction ˆ → ˆ − − − − − − − − Y ⊥ ⊥ Z | Y , even when Z non binary , → ˆ − − − − − − − − Y ⊥ ⊥ Z , even when Z non binary .

HGR: measuring independence Defjnition (Hirschfeld-Gebelein-Rényi Maximum Correlation Coeffjcient) ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Connection exploited in RDC, [8] with CCA in RKHS If f g only linear functions, get CCA. iff V and U independent. ; HGR U V HGR U V 3 / 8 (1) Given two random variables U ∈ U and V ∈ V , hgr ( U , V ) ≜ sup ρ ( f ( U ) , g ( V )) f , g [ ] [ ] f 2 ( U ) g 2 ( V ) < ∞ . ρ :Pearson’s correlation; f , g such that E , E

HGR: measuring independence Defjnition (Hirschfeld-Gebelein-Rényi Maximum Correlation Coeffjcient) ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Connection exploited in RDC, [8] with CCA in RKHS 3 / 8 (1) Given two random variables U ∈ U and V ∈ V , hgr ( U , V ) ≜ sup ρ ( f ( U ) , g ( V )) f , g [ ] [ ] f 2 ( U ) g 2 ( V ) < ∞ . ρ :Pearson’s correlation; f , g such that E , E 0 ≤ HGR ( U , V ) ≤ 1 ; HGR ( U , V ) = 0 iff V and U independent. If f , g only linear functions, get CCA.

Information theory and relaxation Theorem (Witsenhausen’75) ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Extends naturally to continuous variables (replace sums by integrals) 4 / 8 Suppose U and V discrete and let matrix π ( u , v ) Q ( u , v ) = √ √ , then hgr ( U , V ) = σ 2 ( Q ) . π U ( u ) π V ( v ) π ( u , v ) joint distribution of ( U , V ) ; π U and π V marginals. σ 2 : 2nd largest singular value. Upper bound on HGR by χ 2 -divergence

Fairness aware learning; Equalized Odds (EO) argmin ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Related work : [2], [5],[9], [4], [1], [3], [6], [11], [7, 10] 5 / 8 argmin Given expected loss L , function class H and fairness tolerance ε > 0 , solve : HGR | ∞ ≜ || HGR (ˆ L ( h , X , Y ) subject to Y | Y = y , Z | Y = y ) || ∞ ≤ ε h ∈H Practicals: Relax constraint HGR | ∞ ≤ ε to get tractable penalty : If � � � χ 2 (ˆ � χ 2 | 1 = π (ˆ y | y , z | y ) , ˆ π (ˆ y | y ) ⊗ ˆ π ( z | y )) 1 , this yields L ( h , X , Y ) + λχ 2 | 1 h ∈H

Y and Z binary valued: comparison with previous work Drug ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and NN FERM Test case SVM ACC DEO ACC DEO ACC DEO ACC DEO ACC DEO Method 6 / 8 German Adult COMPAS Arrhythmia Smaller datasets difficult for our proposal. NN effect. Results comparable to state of the art use our proposal with neural network to train a classifier such that a Goal: maintain good accuracy while having a smaller DEO. Y . Reproduce and compare experiments from Donini et al. '18 [3]. binary sensitive Z does not unfairly influence an outcome � Naïve SVM 75 ± 4 11 ± 3 72 ± 1 14 ± 2 80 9 74 ± 5 12 ± 5 81 ± 2 22 ± 4 71 ± 5 10 ± 3 73 ± 1 11 ± 2 79 8 74 ± 3 10 ± 6 81 ± 2 22 ± 3 75 ± 5 5 ± 2 96 ± 1 9 ± 2 77 1 73 ± 4 5 ± 3 79 ± 3 10 ± 5 74 ± 7 19 ± 14 97 ± 0 1 ± 0 84 14 74 ± 4 47 ± 19 79 ± 3 15 ± 16 NN + χ 2 75 ± 6 15 ± 9 96 ± 0 0 ± 0 83 3 73 ± 3 25 ± 14 78 ± 5 0 ± 0

DNN 0.7 LR Y | Z , Y Y | Z , Y DNN + L LR + L 2 2 DNN + KL| 1 LR + KL| 1 0.6 Fairness (HGR ∞ ) Fairness (HGR ∞ ) 0.5 DNN + χ 2 | 1 LR + χ 2 | 1 0.5 0.4 0.4 contrast with baseline L Y Z Y penalty which suffers from mini-batching 0.02 0.03 0.1 1.0 Predictive Error (MSE) Predictive Error (MSE) and L Y Z Y some ̂ Figure: Equalized odds with DNN Fairness-Aware Learning for Continuous Attributes and Treatments ICML '19 Continuous Case: Criminality Rates points out of graph to the right. Regression: for KL Figure: Equalized odds with Linear Dataset : UCI Communities+and+Crime . 2 sets of experiments, 3 fairness ̂ work smoothly with mini-batched stochastic optimization; and KL Important that fairness penalty be compatible with DNNs DNN improves fairness at lower price than linear models in terms of MSE. We find : Linear regression (LR), full batches of data penalties : 7 / 8 Deep neural nets (DNN) with mini-batches ( n = 200 ; Adam as optimizer) Regularization parameter λ varies 2 − 4 to 2 6

DNN 0.7 LR Y | Z , Y Y | Z , Y DNN + L LR + L 2 2 DNN + KL| 1 LR + KL| 1 0.6 Fairness (HGR ∞ ) Fairness (HGR ∞ ) 0.5 DNN + χ 2 | 1 LR + χ 2 | 1 0.5 0.4 0.4 0.02 0.03 0.1 1.0 Predictive Error (MSE) Predictive Error (MSE) and L Y Z Y some ̂ Figure: Equalized odds with DNN points out of graph to the right. Fairness-Aware Learning for Continuous Attributes and Treatments ICML '19 Continuous Case: Criminality Rates Regression: for KL DNN improves fairness at lower price than linear models in terms of MSE. penalties : Linear regression (LR), full batches of data Deep neural nets (DNN) with mini-batches ( n ; Adam as optimizer) Regularization parameter varies to We find : Important that fairness penalty be compatible with DNNs Figure: Equalized odds with Linear contrast with baseline L penalty which suffers from mini-batching Dataset : UCI Communities+and+Crime . 2 sets of experiments, 3 fairness ̂ 7 / 8 χ 2 | 1 and KL | 1 work smoothly with mini-batched stochastic optimization; ˆ Y | Z , Y 2

contrast with baseline L Y Z Y penalty which suffers from mini-batching Continuous Case: Criminality Rates ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Figure: Equalized odds with DNN ̂ points out of graph to the right. some Figure: Equalized odds with Linear Dataset : UCI Communities+and+Crime . 2 sets of experiments, 3 fairness ̂ 7 / 8 work smoothly with mini-batched stochastic optimization; Regularization parameter varies We find : DNN improves fairness at lower price than linear models in terms of MSE. Important that fairness penalty be compatible with DNNs and KL to Deep neural nets (DNN) with mini-batches ( n ; Adam as optimizer) Linear regression (LR), full batches of data penalties : 0.7 DNN LR Y | Z , Y Y | Z , Y DNN + L LR + L 2 2 DNN + KL| 1 LR + KL| 1 0.6 Fairness (HGR ∞ ) Fairness (HGR ∞ ) 0.5 DNN + χ 2 | 1 LR + χ 2 | 1 0.5 0.4 0.4 0.02 0.03 0.1 1.0 Predictive Error (MSE) Predictive Error (MSE) ˆ Y | Z , Y Regression: for KL | 1 and L 2

Fairness-Aware Learning for Continuous Attributes and Treatments - PowerPoint PPT Presentation

Fairness-Aware Learning for Continuous Attributes and Treatments Jrmie Mary, Criteo AI Lab Clment Calauznes, Criteo AI Lab Noureddine El Karoui, Criteo AI Lab and UC, Berkeley ICML 2019, Long Beach, CA generalizes to Y generalizes to

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

Introduction to Data Science: Principles ordered categorical data do not have magnitude

61A Lecture 16 Terminology: Python object system: Functions are objects. Wednesday, October 3

Data Examples Announcements Examples: Objects Land Owners Instance attributes are found before

Fairness in Machine Learning: Part I Privacy & Fairness in Data Science CS848 Fall 2019

From E/R Diagrams to Relations Entity set relation Attributes attributes

arXiv:1710.06921v1 [cs.CY] 18 Oct 2017 ABSTRACT and fairness-aware ML methods [6, 7, 8, 9, 10,

Are Learning Health System Are Learning Health System Are Learning Health System Are Learning

COMP30112: Concurrency Topics 5.4: Fairness and Starvation Howard Barringer Room KB2.20: email:

Media Fairness, Diversity 1 Outline Fairness (case studies, basic definitions) Diversity

On Combining State Space Reductions with Global Fairness Assumptions Shaojie Zhang 1 Jun Sun 2 Jun

61A Lecture 16 Wednesday, October 3 Terminology: Attributes, Functions, and Methods 2

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and

Descriptor Codes with Attributes Descriptor Codes with Attributes Oscar R. Cantu August 2009

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

INTRO TO ETHICS AND INTRO TO ETHICS AND FAIRNESS FAIRNESS Eunsuk Kang Required reading: R.

Preparing for Your Reviews Debbie Calhoun, MS, RD, SNS School Meals Program Specialist OSPI

cedram Math literature Math E-literature DML Implementation Conclusions Outline The

Live-Range Reordering Sven Verdoolaege 1 Albert Cohen 2 1 Polly Labs and KU Leuven 2 INRIA and

Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient

CS 188: Artificial Intelligence Optimization and Neural Nets Instructors: Pieter Abbeel and Dan

Statistical Preliminaries Stony Brook University CSE545, Fall 2016 Random Variables X : A

EFET position paper One line title for an improved market design in intraday Irina Nikolova

Neural Networks Stefan Edelkamp 1 Overview - Introduction - Percepton - Hofield-Nets -

Fairness-Aware Learning for Continuous Attributes and Treatments - PowerPoint PPT Presentation

Fairness-Aware Learning for Continuous Attributes and Treatments Jrmie Mary, Criteo AI Lab Clment Calauznes, Criteo AI Lab Noureddine El Karoui, Criteo AI Lab and UC, Berkeley ICML 2019, Long Beach, CA generalizes to Y generalizes to

Fairness in Machine Learning Fairness in Supervised Learning Make decisions by machine learning:

Introduction to Data Science: Principles ordered categorical data do not have magnitude

61A Lecture 16 Terminology: Python object system: Functions are objects. Wednesday, October 3

Data Examples Announcements Examples: Objects Land Owners Instance attributes are found before

Fairness in Machine Learning: Part I Privacy &amp; Fairness in Data Science CS848 Fall 2019

From E/R Diagrams to Relations Entity set relation Attributes attributes

arXiv:1710.06921v1 [cs.CY] 18 Oct 2017 ABSTRACT and fairness-aware ML methods [6, 7, 8, 9, 10,

Are Learning Health System Are Learning Health System Are Learning Health System Are Learning

COMP30112: Concurrency Topics 5.4: Fairness and Starvation Howard Barringer Room KB2.20: email:

Media Fairness, Diversity 1 Outline Fairness (case studies, basic definitions) Diversity

On Combining State Space Reductions with Global Fairness Assumptions Shaojie Zhang 1 Jun Sun 2 Jun

61A Lecture 16 Wednesday, October 3 Terminology: Attributes, Functions, and Methods 2

ReNoun Fact Extraction for Nominal Attributes Mohamed Yahya, Steven Whang, Rahul Gupta, and

Descriptor Codes with Attributes Descriptor Codes with Attributes Oscar R. Cantu August 2009

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

INTRO TO ETHICS AND INTRO TO ETHICS AND FAIRNESS FAIRNESS Eunsuk Kang Required reading: R.

Preparing for Your Reviews Debbie Calhoun, MS, RD, SNS School Meals Program Specialist OSPI

cedram Math literature Math E-literature DML Implementation Conclusions Outline The

Live-Range Reordering Sven Verdoolaege 1 Albert Cohen 2 1 Polly Labs and KU Leuven 2 INRIA and

Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient

CS 188: Artificial Intelligence Optimization and Neural Nets Instructors: Pieter Abbeel and Dan

Statistical Preliminaries Stony Brook University CSE545, Fall 2016 Random Variables X : A

EFET position paper One line title for an improved market design in intraday Irina Nikolova

Neural Networks Stefan Edelkamp 1 Overview - Introduction - Percepton - Hofield-Nets -

Fairness in Machine Learning: Part I Privacy & Fairness in Data Science CS848 Fall 2019