Algorithmic Bias Machine Learning An area of AI that studies how to get computers to learn from experience (e.g. data) Identify patterns from a training dataset Then generalize from these patterns and apply it to future data (that is different). This is called a test dataset Supervised learning o Features -> Classifier -> Class Label o Features: traits of a data instance (e.g. keywords in your email) that are informative as to the classification o Class Label: the classification (e.g. Personal or School for email sorting) o Produce the classifier by training on the training data Algorithmic Bias What is it? o Bias introduced to machine learning due to the training data o Garbage in / Garbage out: machine learning algorithms reflect societal bias when applied to biased data (bias in the form of discrimination, prejudice and unfairness) o Do machine learning algorithms respect protected variables? o These are characteristics that anti-discrimination laws protect in certain situations E.g. Fair housing act prevents landlords from discrimination based on 7 protected classes: Race
Gender Religion Disability Color National Origin Family status Can’t just ignore features that correspond to these protected variables and say your algorithm is not biased Due to confounding factors e.g. Zip code and race are closely correlated in many parts of the US o What are causes of algorithmic bias? Biased training data (e.g. biased class labels) Inclusion of protected variables as features; inclusion of variables correlated with protected variables are highly problematic Downstream goals (e.g. business profitability) might conflict with discrimination Misunderstanding / misuse of machine learning Machine learning applied to the wrong tasks Domain adaptation: machine learning algorithm trained on data from one distribution but applied to test data from another distribution Missing / corrupted data Sampling selection bias o How do you fix this problem? Not sure if you can: A lot of these are societal problems
Can you correct the bias without introducing bias of a different sort? Understand the problem so that you can use the right machine learning algorithm Know when NOT to use a particular algorithm Make systems that are auditable. Have less high-impact outcomes earlier on, especially when an algorithm is involved o Difficult problems that make a solution hard Limited to what data you actually have. What about the data you don’t have? Definitions of fairness vary greatly – which one do you use? Lack of social context: can’t transfer a machine learning algorithm from one context to another
Recommend
More recommend