arxiv 1710 06921v1 cs cy 18 oct 2017
play

arXiv:1710.06921v1 [cs.CY] 18 Oct 2017 ABSTRACT and - PDF document

Themis-ml: A Fairness-aware Machine Learning Interface for End-to-end Discrimination Discovery and Mitigation Niels Bantilan Arena.io New York, NY niels.bantilan@gmail.com arXiv:1710.06921v1 [cs.CY] 18 Oct 2017 ABSTRACT and fairness-aware


  1. Themis-ml: A Fairness-aware Machine Learning Interface for End-to-end Discrimination Discovery and Mitigation Niels Bantilan Arena.io New York, NY niels.bantilan@gmail.com arXiv:1710.06921v1 [cs.CY] 18 Oct 2017 ABSTRACT and fairness-aware ML methods [6, 7, 8, 9, 10, 11, 12, 13], so we build on work done by others and seek to leverage these techniques in the context of research- and product-based As more industries integrate machine learning into socially machine learning applications. sensitive decision processes like hiring, loan-approval, and parole-granting, we are at risk of perpetuating historical and Our contributions in this paper are three-fold. First, we pro- contemporary socioeconomic disparities. This is a critical pose an application programming interface (API) for“Fairness- problem because on the one hand, organizations who use but aware Machine Learning Interfaces”(FMLI) in the context of do not understand the discriminatory potential of such sys- a simple binary classifier. Second, we introduce themis-ml, tems will facilitate the widening of social disparities under an FMLI-compliant library, and apply it to a hypothetical the assumption that algorithms are categorically objective. loan-granting DSS using the German Credit Dataset [14]. On the other hand, the responsible use of machine learning Finally, we evaluate the efficacy of themis-ml as a tool for can help us measure, understand, and mitigate the implicit measuring potential discrimination (PD) in both training historical biases in socially sensitive data by expressing im- data and ML predictions as well as mitigating PD using plicit decision-making mental models in terms of explicit fairness-aware methods. Our hope is that themis-ml serves statistical models. In this paper we specify, implement, and as a reference implementation that others might use and evaluate a “fairness-aware” machine learning interface called extend for their own purposes. themis-ml, which is intended for use by individual data sci- entists and engineers, academic research teams, or larger product teams who use machine learning in production sys- 2 Bias and Discrimination tems. Colloquially, bias is simply a preference for or against some- 1 Introduction thing, e.g. preferring vanilla over chocolate ice cream. While this definition is intuitive, here we explicitly define algorith- mic bias as a form of bias that occurs when mathematical In recent years, the transformative potential of machine learn- rules favor one set of attributes over others in relation to ing (ML) in many industries has propelled ML into the fore- some target variable, like “approving” or “denying” a loan. front of mainstream media. From improving products and services to optimizing logistics and operations, ML and ar- Algorithmic bias in machine learning models can occur when tificial intelligence more broadly offer a wide range of tools a trained model systematically generates predictions that for organizations to enhance their internal and external ca- favor one group over another in relation to some set of at- pabilities. tributes, e.g. education, and some target variable, e.g. “de- fault on credit”. While the definition above of bias is amoral, As with any tool, we can use ML to engender great social discrimination is in essence moral, occurring when an ac- benefit, but as [1] emphasizes, we can also misuse it to bring tion is based on biases resulting in the unfair treatment of about devastating harm. In this paper, we focus on ML people. We define fairness as the inverse of discrimination, systems in the context of Decision Support Systems (DSS), meaning that a “fairness-aware” model is one that produces which are software systems that are intended to assist hu- non-discriminatory predictions. mans in various decision-making contexts [2, 3, 4, 5]. The misuse of ML in these types of systems could potentially Bias can lead to either direct (intended/explicit) or indirect precipitate a widespread adverse impact on society by in- (unintended/implicit) discrimination, and the predominant troducing insidious feedback loops between biased historical legal concepts used to determine these two types are known data and current decision-making [1]. as disparate treatment and disparate impact, respectively [15]. As [6, 7] suggest, we can address disparate treatment in Researchers have developed many discrimination discovery ML models by simply removing all variables that are highly correlated to the protected class of interest, in addition to the protected class itself, from the training data. However, as [6] points out, doing so does not necessarily mitigate dis- Bloomberg Data for Good Exchange Conference. 24-Sep-2017, Chicago, IL, USA.

  2. Table 1: A Simple Classification Pipeline Table 2: FMLI Use Cases API Interface Function Examples Use Case Rationale Transformer Preprocess raw data mean-unit variance Detect and reduce discrimina- Fairness-aware modeling for model training. scaling, min-max tion in a production machine aligns with team/company scaling learning pipeline. values, provides protection from legal liability. Estimator Train models to logistic regression, perform a classifica- random forest Measure individual-/group- Need to assess the potential tion task. level discrimination in data bias resulting from training with respect to a protected models on data. Scorer perfor- accuracy, f1-score, Evaluate class and outcome of interest. mance of different area under the curve models. Preprocess raw data or post- Unable to change the under- process model predictions in a lying implementation of the Predictor Predict outcomes single-classifier pre- way that reduces discrimina- model training process. for new data. diction, ensemble tory predictions generated by prediction models. Explicitly learn model param- Need for flexibility when ex- eters that produce fair predic- perimenting with or deploy- criminatory predictions and may actually introduce unfair- tions for a variety of model ing different model types. ness into an otherwise fair system. In contrast, addressing types. disparate impact is more complex because it depends on Evaluate the degree to which Need for assessing the busi- historical processes that generated the training data, non- fairness-aware methods re- ness consequences or other linear relationships between the features and protected class, duce discrimination and as- implications of deploying a and whether we are interested in measuring individual- or sess the fairness-utility trade- fairness-aware model. group-level discrimination [12]. off. 3 A Fairness-aware Machine Learning Inter- face might have no control or full control over the specific model training implementation. So how does one measure disparate impact and individual- Fairness as performance. Provide estimators and scoring /group-level discrimination in an ML-driven product? In metrics that explicitly encode a notion of both model ac- this section, we describe the main components of a simple curacy and fairness so that models can optimize for both. classification system, enumerate a few of the use cases that a research or product team might have for using an FMLI, Transparency of fairness-utility tradeoff. Fair models and propose an API that fulfills these use cases. often make less accurate predictions [8, 13], which is an important factor when assessing their business impact. A simple classification ML pipeline consists of five steps: data ingestion, data preprocessing, model training, model evaluation, and prediction generation on new examples. Data ingestion is outside the scope of this paper because it is a highly variable process that depends on the application, of- ten involves considerable engineering effort, and potentially 4.1 Preliminaries requires external stakeholder buy-in. Table 1 outlines a simple classification system in terms of the In the following subsections we describe specific methods core interfaces in scikit-learn (sklearn), which is a machine from the ML fairness literature that map onto each of the learning library in the Python programming language [16], sklearn interfaces. Note that we only provide a high level and table 2 delineates some of the use cases that research or summary of each method, citing the original sources for more product teams might have to justify the use of an FMLI. implementation details. The following descriptions make two assumptions: (i) the positive target label y + refers to a desirable outcome, e.g. “approve loan”, and vice versa for 4 FMLI Specification the negative target label y − , and (ii) the protected class is a binary variable defined as s ∈ { d, a } , where X d are mem- Here we propose a high-level specification of themis-ml, an bers of the disadvantaged group and X a are members of the open source FMLI named after the ancient Greek titaness advantaged group. of justice (the library can be found on github.) We adopt Following these conventions, we define X d,y + and X d,y − sklearn’s principles of consistency, inspection, non- prolifer- as the set of observations of the disadvantaged group that ation of classes, composition, and sensible defaults [16], and are positively labelled and negatively labelled, respectively. extend them with the following FMLI-specific principles: Similarly, X a,y + , and X a,y − are observations of the advan- Model flexibility. Focus on fairness-aware methods that taged group that are positively and negatively labelled, re- are applicable to a variety of model types because users spectively.

Recommend


More recommend