Sample Selection Bias Lei Tang Feb. 20th, 2007 Classical ML vs. - PowerPoint PPT Presentation

Sample Selection Bias Lei Tang Feb. 20th, 2007

Classical ML vs. Reality � Training data and Test data share the same distribution (In classical Machine Learning) � But that’s not always the case in reality. � Survey data � Survey data � Species habitat modeling based on data of only one area � Training and test data collected by different experiments � Newswire articles with timestamps

Sample selection bias � Standard setting: data (x,y) are drawn independently from a distribution D � If the selected samples is not a random samples of D, then the samples are biased. of D, then the samples are biased. � Usually, training data are biased, but we want to apply the classifier to unbiased samples.

Bias Analysis for Classifiers(1) � Logistic Regression Any classifiers directly models P(y|x) won’t be affected by bias � Bayesian Classifier But for naïve Bayesian classifier

Bias Analysis for Classifiers(2) � Hard margin SVM: no bias effect. Soft margin SVM: has bias effect as the cost of misclassification might change. � Decision Tree usually results in a different classifier if the bias is presented � In sum, most classifiers are still sensitive to the sample bias. � This is in asymptotic analysis assuming the samples are “enough”

Correcting Bias � Expected Risk: � Suppose training set from Pr , test set from Pr’ � So we minimize the empirical regularized risk:

Estimate the weights � �� But how to estimate the weight of each sample? � Brute force approach: � Estimate the density of Pr(x) and Pr’(x), respectively, � Then calculate the sample weight. � Not applicable as density estimation is more difficult than classification given limited number of samples. � Existing works use simulation experiments in which both Pr(x) and Pr’(x) are known (NOT REALISTIC)

Distribution Matching � The expectation in feature space: � We have � Hence, the problem can be formulated as � Solution is:

Empirical KMM optimization where Therefore, it’s equivalent to solve the QP problem:

Experiments � A Toy Regression Example

Simulation � Select some UCI datasets to inject some sample selection bias into training, then test on unbiased samples.

Bias on Labels

Unexplained � From theory, the importance sampling should be the best, why KMM performs better? � Why kernel methods? Can we just do the matching using input features? input features? � Can we just perform a logistic regression to estimate \beta by treating test data as positive class, and training data as negative. Then, \beta is the odds.

Some Related Problems � Semi-supervised Learning (Is it equivalent??) � Multi-task Learning: assume P(y|x) to be different. But sample selection bias(mostly) different. But sample selection bias(mostly) assume P(y|x) to be the same. MTL requires training data for each task. � Is it possible to discriminate features which introduce the bias? Or find invariant dimensionalities?

Any Questions? Happy Pig Year!

Sample Selection Bias Lei Tang Feb. 20th, 2007 Classical ML vs. - PowerPoint PPT Presentation

Sample Selection Bias Lei Tang Feb. 20th, 2007 Classical ML vs. Reality Training data and Test data share the same distribution (In classical Machine Learning) But thats not always the case in reality. Survey data Survey data

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

pn -junctonJ under dark conditons No Bias Forward Bias Reverse Bias Model - + Circuit P N

Transistor bias circuits 1 Objectives Discuss the concept of dc biasing of a transistor for

Microwave Scan Bias Status Report Bjorn Lambrigtsen February 25, 2003 AIRS Science Team

2018 DDC Safety Summit THINK SAFETY! Department of Design and Construction 1 DDC MISSION

Math 233 - October 6, 2009 Understand the nature of critical points for quadratic functions.

Effective Staff Training By: Liz W echter, M.S.; Middle School Autistic Support Teacher, School

INVESTOR PRESENTATION September 2019 1 DISCLAIMER Some statements contained in this

Threats and Analysis Bastien MICHEL Aarhus University & TrygFondens Centre Course Overview

Existing data infrastructure and new information technologies to build sustainable data for

Comments: Long-Term Follow-Up of Supported Employment Recipients 1 B O B W E A T H E R S S O

Out of furniture business? Anne Strm Prestvik The main research question : What are the factors

Sample Selection Bias Lei Tang Feb. 20th, 2007 Classical ML vs. - PowerPoint PPT Presentation

Sample Selection Bias Lei Tang Feb. 20th, 2007 Classical ML vs. Reality Training data and Test data share the same distribution (In classical Machine Learning) But thats not always the case in reality. Survey data Survey data

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

Equity &amp; Excellence: Hidden Bias Implicit Bias Inherent Bias

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

pn -junctonJ under dark conditons No Bias Forward Bias Reverse Bias Model - + Circuit P N

Transistor bias circuits 1 Objectives Discuss the concept of dc biasing of a transistor for

Microwave Scan Bias Status Report Bjorn Lambrigtsen February 25, 2003 AIRS Science Team

2018 DDC Safety Summit THINK SAFETY! Department of Design and Construction 1 DDC MISSION

Math 233 - October 6, 2009 Understand the nature of critical points for quadratic functions.

Effective Staff Training By: Liz W echter, M.S.; Middle School Autistic Support Teacher, School

INVESTOR PRESENTATION September 2019 1 DISCLAIMER Some statements contained in this

Threats and Analysis Bastien MICHEL Aarhus University &amp; TrygFondens Centre Course Overview

Existing data infrastructure and new information technologies to build sustainable data for

Comments: Long-Term Follow-Up of Supported Employment Recipients 1 B O B W E A T H E R S S O

Out of furniture business? Anne Strm Prestvik The main research question : What are the factors

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

Threats and Analysis Bastien MICHEL Aarhus University & TrygFondens Centre Course Overview