the european commission s science and knowledge service
play

The European Commissions science and knowledge service Joint - PowerPoint PPT Presentation

The European Commissions science and knowledge service Joint Research Centre Why machine learning may lead to unfairness Songl Tolan 1 , Marius Miron 1 , Emilia Gomez 1,2 , Carlos Castillo 2 1 European Commissions Joint Research Centre 2


  1. The European Commission’s science and knowledge service Joint Research Centre

  2. Why machine learning may lead to unfairness Songül Tolan 1 , Marius Miron 1 , Emilia Gomez 1,2 , Carlos Castillo 2 1 European Commission’s Joint Research Centre 2 Universitat Pompeu Fabra

  3. Machine learning for decision making

  4. The criminal justice case Trade-off: predictive performance vs fairness

  5. Criminal recidivism

  6. Criminal recidivism prediction Prisoner Human Decision expert / Sentence

  7. Criminal recidivism prediction Prisoner Human Decision Outcome expert / Sentence

  8. Criminal recidivism prediction Prisoner Human Decision Outcome expert / Sentence

  9. Criminal recidivism prediction Features Machine Prediction Outcome learning model

  10. Criminal recidivism prediction Examples of static Age at crime features: Sex Nationality Previous number of crimes Sentence Year of crime Probation

  11. Fairness A decision is fair if it does not discriminate against people based on their membership to a protected group

  12. Fairness Example of protected Age at crime features: Sex Nationality Previous number of crimes Sentence Year of crime Probation

  13. Measuring unfairness Sex Nationality Features Machine Prediction Outcome learning model

  14. Measuring unfairness False negative False positive Prediction Outcome

  15. False negative rate = Miss rate Σ Σ

  16. False positive rate = False alarm rate Σ Σ

  17. Group fairness - sex Σ sex=Male Σ sex=Male

  18. False negative rate disparity FNR female FNR disparity = FNR male How likely it is for a member of a group to be wrongfully labeled as non-recidivist.

  19. Headache?

  20. Too complicated? The fairness in machine learning literature comprises at least 21 disparity metrics.

  21. Juvenile recidivism

  22. Risk assessment tools Structured Assessment of Violence Risk in Youth (SAVRY) ● high degree of involvement from human experts ● open and interpretable (in comparison with COMPAS) ● 24 risk factors scored low, medium or high

  23. SAVRY Examples of SAVRY Early violence features: Self-harm history Home violence Poor school achievement Stress and poor coping Substance abuse Criminal parent/caregiver

  24. Criminal recidivism prediction Σ SAVRY sum SAVRY Final expert Outcome features evaluation Expert

  25. Static ML Features Machine Prediction Outcome learning model

  26. SAVRY ML SAVRY Machine Prediction Outcome features learning model

  27. Static + SAVRY ML Features Machine Prediction Outcome learning model

  28. Dataset Juvenile offenders in Catalonia 1 ● 855 people ● crimes between 2002 -2010, release in 2010 ● age at crime between 12 and 17 years old ● status followed up on 2013 and 2015 1. Open data: http://cejfe.gencat.cat/en/recerca/opendata/jjuvenil/reincidencia-justicia-menors/index.html

  29. Experimental setup Training a set of ML methods ● logistic regression (logit), multi-layer perceptron (mlp), support vector machines (lsvm), k-nearest neighbors (knn), random forest (rf), naive bayes (nb) ● k-fold cross validation with k=10 (10% test, 10% validation, 80% training) ● we run 50 different experiments with different initial conditions ● we compute feature importance with LIME 1 1. LIME https://github.com/marcotcr/lime

  30. Predictive performance - AUC ROC

  31. Results, predictive performance AUC SAVRY Sum has 0.64 AUC Expert has 0.66 AUC

  32. Results: disparity, sex False alarm rates Miss rates

  33. Results: disparity, sex False alarm rates Miss rates

  34. Results: disparity, sex False alarm rates Miss rates

  35. Results: disparity, sex False alarm rates Miss rates

  36. Results: disparity, sex False alarm rates Miss rates

  37. Results: disparity, sex False alarm rates Miss rates

  38. Results: disparity, nationality False alarm rates Miss rates

  39. Results: disparity, nationality False alarm rates Miss rates

  40. Results: disparity, nationality False alarm rates Miss rates

  41. Results: disparity, nationality False alarm rates Miss rates

  42. Results: disparity, nationality False alarm rates Miss rates

  43. Results: feature importance for logit

  44. Results: feature importance for mlp

  45. Results: difference in base rates (prevalence)

  46. Results: difference in base rates

  47. Results: difference in base rates

  48. Conclusions ● ML models have better predictive performance ● ML models tend to discriminate more ● static features outweigh SAVRY features as importance ● preliminary study: the cause may be in the data (base rates)

  49. Contributions We propose a methodology and a ML framework 1 ● to easily train ML models on tabular data (csv files) ● to evaluate these models in terms of predictive performance and fairness ● to connect to interpretability frameworks ● to reproduce with ease results and research 1. Open framework: https://gitlab.com/HUMAINT/humaint-fatml

  50. Thank you! Any questions? You can find me at @nkundiushuti & marius.miron@ec.europa.eu & mariusmiron.com

Recommend


More recommend