algorithmic bias
play

Algorithmic Bias Machine Learning An area of AI that studies how to - PDF document

Algorithmic Bias Machine Learning An area of AI that studies how to get computers to learn from experience (e.g. data) Identify patterns from a training dataset Then generalize from these patterns and apply it to future data (that is


  1. Algorithmic Bias Machine Learning  An area of AI that studies how to get computers to learn from experience (e.g. data)  Identify patterns from a training dataset  Then generalize from these patterns and apply it to future data (that is different). This is called a test dataset  Supervised learning o Features -> Classifier -> Class Label o Features: traits of a data instance (e.g. keywords in your email) that are informative as to the classification o Class Label: the classification (e.g. Personal or School for email sorting) o Produce the classifier by training on the training data Algorithmic Bias  What is it? o Bias introduced to machine learning due to the training data o Garbage in / Garbage out: machine learning algorithms reflect societal bias when applied to biased data (bias in the form of discrimination, prejudice and unfairness) o Do machine learning algorithms respect protected variables? o These are characteristics that anti-discrimination laws protect in certain situations  E.g. Fair housing act prevents landlords from discrimination based on 7 protected classes:  Race

  2.  Gender  Religion  Disability  Color  National Origin  Family status  Can’t just ignore features that correspond to these protected variables and say your algorithm is not biased  Due to confounding factors e.g. Zip code and race are closely correlated in many parts of the US o What are causes of algorithmic bias?  Biased training data (e.g. biased class labels)  Inclusion of protected variables as features; inclusion of variables correlated with protected variables are highly problematic  Downstream goals (e.g. business profitability) might conflict with discrimination  Misunderstanding / misuse of machine learning  Machine learning applied to the wrong tasks  Domain adaptation: machine learning algorithm trained on data from one distribution but applied to test data from another distribution  Missing / corrupted data  Sampling selection bias o How do you fix this problem?  Not sure if you can:  A lot of these are societal problems

  3.  Can you correct the bias without introducing bias of a different sort?  Understand the problem so that you can use the right machine learning algorithm  Know when NOT to use a particular algorithm  Make systems that are auditable.  Have less high-impact outcomes earlier on, especially when an algorithm is involved o Difficult problems that make a solution hard  Limited to what data you actually have. What about the data you don’t have?  Definitions of fairness vary greatly – which one do you use?  Lack of social context: can’t transfer a machine learning algorithm from one context to another

Recommend


More recommend