Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November 30th, 2007
Outline ◮ What is selection bias? ◮ Types of selection bias. ◮ Overcoming learnable bias with weighting. ◮ Overcoming bias with maximum likelihood (ML). ◮ Experiment 1: ADULT dataset. ◮ Experiment 2: CA-housing dataset. ◮ Future work & conclusions.
What is selection bias? Traditional semi-supervised learning assumes: ◮ Some samples are labeled, some are not. ◮ Labeled and unlabeled examples are identically distributed.
What is selection bias? Traditional semi-supervised learning assumes: ◮ Some samples are labeled, some are not. ◮ Labeled and unlabeled examples are identically distributed. Semi-supervised learning under selection bias: ◮ Labeled examples are selected from the general population not at random. ◮ Labeled and unlabeled examples may be differently distributed .
Examples ◮ Loan application approval ◮ Goal is to model repay/default behavior of all applicants . ◮ But the training set only includes labels for people who were approved for a loan.
Examples ◮ Loan application approval ◮ Goal is to model repay/default behavior of all applicants . ◮ But the training set only includes labels for people who were approved for a loan.
Examples ◮ Loan application approval ◮ Goal is to model repay/default behavior of all applicants . ◮ But the training set only includes labels for people who were approved for a loan. ◮ Spam filtering ◮ Goal is an up-to-date spam filter. ◮ But, while up-to-date unlabeled emails are available, hand-labeled data sets are expensive and may be rarely updated .
Framework Types of selection bias are distinguised by conditional independence assumptions between: ◮ x is the feature vector. ◮ y is the class label. If y is binary, y ∈ { 1 , 0 } . ◮ s is the binary selection variable. If y i is observable then s i = 1, otherwise s i = 0.
Types of selection bias – No bias x y s s ⊥ x , s ⊥ y ◮ The standard semi-supervised learning scenario.
Types of selection bias – No bias x y s s ⊥ x , s ⊥ y ◮ The standard semi-supervised learning scenario. ◮ Labeled examples are selected completely at random from the general population.
Types of selection bias – No bias x y s s ⊥ x , s ⊥ y ◮ The standard semi-supervised learning scenario. ◮ Labeled examples are selected completely at random from the general population. ◮ The missing labels are said to be “missing completely at random” (MCAR) in the literature.
Types of selection bias – Learnable bias x y s s ⊥ y | x ◮ Labeled examples are selected from the general population only depending on features x .
Types of selection bias – Learnable bias x y s s ⊥ y | x ◮ Labeled examples are selected from the general population only depending on features x . ◮ A model p ( s | x ) is learnable.
Types of selection bias – Learnable bias x y s s ⊥ y | x ◮ Labeled examples are selected from the general population only depending on features x . ◮ A model p ( s | x ) is learnable. ◮ The missing labels are said to be “missing at random” (MAR), or ”ignorable bias” in the literature. ◮ p ( y | x , s = 1) = p ( y | x ).
Model mis-specification under learnable bias p ( y | x , s = 1) = p ( y | x ) implies decision boundaries are the same in the labeled and general populations. But suppose the model is misspecified?
Model mis-specification under learnable bias p ( y | x , s = 1) = p ( y | x ) implies decision boundaries are the same in the labeled and general populations. But suppose the model is misspecified? Then a sub-optimal decision boundary may be learned under MAR bias. Ignoring samples Viewing hidden labels without labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + − + + + + + + + + + + − + − + + + + + + + + + + + + + + + + + + + + + + + + + + − + + + + + + + + + + + + + + + + + + + + − + + + + + − + + + + + + − + + + + + + + + + + + − + + + + + + + + + + + + + + − + − + + + + + + + + + + + + + − − + + + − + + + + − + + + + − + + − − + + + + + + + + + + + + + − − + + − − − + + + + + + + + + + + + + + − + + + + + − − + + + + − + − + + + + + + + − + + − − − + − + + − − − + + − − + + + + − − − − + − − − − + + + + − + − + − + − + + + + + − + − − − − + + + − + + + − − − + + + + − + + + − − − − − + + + − − + − − + + + − + + − − − + + + + + − − − − + − − − − + − − + − − + − − − − + + + + + − − − + + − − − − − − − − − + + − − − − − − − − − − + − − − + + + − − − − − − − − − − − − − − − − − − + + − − + − − + + − + + − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − best mis−specified bondary estim. mis−specified boundary − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − true boundary − − estim. well−specified boundary − − − − − − − − − − − − − − −
Types of selection bias – Arbitrary bias x y s ◮ Labeled examples are selected from the general population possibly depending on the label itself.
Types of selection bias – Arbitrary bias x y s ◮ Labeled examples are selected from the general population possibly depending on the label itself. ◮ No independence assumptions can be made.
Types of selection bias – Arbitrary bias x y s ◮ Labeled examples are selected from the general population possibly depending on the label itself. ◮ No independence assumptions can be made. ◮ The missing labels are said to be “missing not at random” (MNAR) in the literature.
Overcoming bias – Two alternate goals The training data consist of { ( x i , y i ) | s i = 1 } and { ( x i ) | s i = 0 } . Two goals are possible:
Overcoming bias – Two alternate goals The training data consist of { ( x i , y i ) | s i = 1 } and { ( x i ) | s i = 0 } . Two goals are possible: ◮ General population modeling: Learn p ( y | x ), e.g. loan application approval.
Overcoming bias – Two alternate goals The training data consist of { ( x i , y i ) | s i = 1 } and { ( x i ) | s i = 0 } . Two goals are possible: ◮ General population modeling: Learn p ( y | x ), e.g. loan application approval. ◮ Unlabeled population modeling: Learn p ( y | x , s = 0), e.g. spam filtering.
Overcoming learnable bias – General population modeling Lemma 1 Under MAR bias in the labeling, p ( s = 1) p ( x , y ) = p ( s = 1 | x ) p ( x , y | s = 1) if all probabilities are non-zero. The distribution of samples in the general population is a weighted version of the distribution of labeled samples. Since p ( s | x ) is learnable, we can estimate weights.
Recommend
More recommend