making generative classifiers robust to selection bias
play

Making Generative Classifiers Robust to Selection Bias Andrew Smith - PowerPoint PPT Presentation

Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November 30th, 2007 Outline What is selection bias? Types of selection bias. Overcoming learnable bias with weighting. Overcoming bias with


  1. Making Generative Classifiers Robust to Selection Bias Andrew Smith Charles Elkan November 30th, 2007

  2. Outline ◮ What is selection bias? ◮ Types of selection bias. ◮ Overcoming learnable bias with weighting. ◮ Overcoming bias with maximum likelihood (ML). ◮ Experiment 1: ADULT dataset. ◮ Experiment 2: CA-housing dataset. ◮ Future work & conclusions.

  3. What is selection bias? Traditional semi-supervised learning assumes: ◮ Some samples are labeled, some are not. ◮ Labeled and unlabeled examples are identically distributed.

  4. What is selection bias? Traditional semi-supervised learning assumes: ◮ Some samples are labeled, some are not. ◮ Labeled and unlabeled examples are identically distributed. Semi-supervised learning under selection bias: ◮ Labeled examples are selected from the general population not at random. ◮ Labeled and unlabeled examples may be differently distributed .

  5. Examples ◮ Loan application approval ◮ Goal is to model repay/default behavior of all applicants . ◮ But the training set only includes labels for people who were approved for a loan.

  6. Examples ◮ Loan application approval ◮ Goal is to model repay/default behavior of all applicants . ◮ But the training set only includes labels for people who were approved for a loan.

  7. Examples ◮ Loan application approval ◮ Goal is to model repay/default behavior of all applicants . ◮ But the training set only includes labels for people who were approved for a loan. ◮ Spam filtering ◮ Goal is an up-to-date spam filter. ◮ But, while up-to-date unlabeled emails are available, hand-labeled data sets are expensive and may be rarely updated .

  8. Framework Types of selection bias are distinguised by conditional independence assumptions between: ◮ x is the feature vector. ◮ y is the class label. If y is binary, y ∈ { 1 , 0 } . ◮ s is the binary selection variable. If y i is observable then s i = 1, otherwise s i = 0.

  9. Types of selection bias – No bias x y s s ⊥ x , s ⊥ y ◮ The standard semi-supervised learning scenario.

  10. Types of selection bias – No bias x y s s ⊥ x , s ⊥ y ◮ The standard semi-supervised learning scenario. ◮ Labeled examples are selected completely at random from the general population.

  11. Types of selection bias – No bias x y s s ⊥ x , s ⊥ y ◮ The standard semi-supervised learning scenario. ◮ Labeled examples are selected completely at random from the general population. ◮ The missing labels are said to be “missing completely at random” (MCAR) in the literature.

  12. Types of selection bias – Learnable bias x y s s ⊥ y | x ◮ Labeled examples are selected from the general population only depending on features x .

  13. Types of selection bias – Learnable bias x y s s ⊥ y | x ◮ Labeled examples are selected from the general population only depending on features x . ◮ A model p ( s | x ) is learnable.

  14. Types of selection bias – Learnable bias x y s s ⊥ y | x ◮ Labeled examples are selected from the general population only depending on features x . ◮ A model p ( s | x ) is learnable. ◮ The missing labels are said to be “missing at random” (MAR), or ”ignorable bias” in the literature. ◮ p ( y | x , s = 1) = p ( y | x ).

  15. Model mis-specification under learnable bias p ( y | x , s = 1) = p ( y | x ) implies decision boundaries are the same in the labeled and general populations. But suppose the model is misspecified?

  16. Model mis-specification under learnable bias p ( y | x , s = 1) = p ( y | x ) implies decision boundaries are the same in the labeled and general populations. But suppose the model is misspecified? Then a sub-optimal decision boundary may be learned under MAR bias. Ignoring samples Viewing hidden labels without labels + + + + + + + + + + + + + + + + + + + + + + + + + + + + + − + + + + + + + + + + − + − + + + + + + + + + + + + + + + + + + + + + + + + + + − + + + + + + + + + + + + + + + + + + + + − + + + + + − + + + + + + − + + + + + + + + + + + − + + + + + + + + + + + + + + − + − + + + + + + + + + + + + + − − + + + − + + + + − + + + + − + + − − + + + + + + + + + + + + + − − + + − − − + + + + + + + + + + + + + + − + + + + + − − + + + + − + − + + + + + + + − + + − − − + − + + − − − + + − − + + + + − − − − + − − − − + + + + − + − + − + − + + + + + − + − − − − + + + − + + + − − − + + + + − + + + − − − − − + + + − − + − − + + + − + + − − − + + + + + − − − − + − − − − + − − + − − + − − − − + + + + + − − − + + − − − − − − − − − + + − − − − − − − − − − + − − − + + + − − − − − − − − − − − − − − − − − − + + − − + − − + + − + + − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − best mis−specified bondary estim. mis−specified boundary − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − true boundary − − estim. well−specified boundary − − − − − − − − − − − − − − −

  17. Types of selection bias – Arbitrary bias x y s ◮ Labeled examples are selected from the general population possibly depending on the label itself.

  18. Types of selection bias – Arbitrary bias x y s ◮ Labeled examples are selected from the general population possibly depending on the label itself. ◮ No independence assumptions can be made.

  19. Types of selection bias – Arbitrary bias x y s ◮ Labeled examples are selected from the general population possibly depending on the label itself. ◮ No independence assumptions can be made. ◮ The missing labels are said to be “missing not at random” (MNAR) in the literature.

  20. Overcoming bias – Two alternate goals The training data consist of { ( x i , y i ) | s i = 1 } and { ( x i ) | s i = 0 } . Two goals are possible:

  21. Overcoming bias – Two alternate goals The training data consist of { ( x i , y i ) | s i = 1 } and { ( x i ) | s i = 0 } . Two goals are possible: ◮ General population modeling: Learn p ( y | x ), e.g. loan application approval.

  22. Overcoming bias – Two alternate goals The training data consist of { ( x i , y i ) | s i = 1 } and { ( x i ) | s i = 0 } . Two goals are possible: ◮ General population modeling: Learn p ( y | x ), e.g. loan application approval. ◮ Unlabeled population modeling: Learn p ( y | x , s = 0), e.g. spam filtering.

  23. Overcoming learnable bias – General population modeling Lemma 1 Under MAR bias in the labeling, p ( s = 1) p ( x , y ) = p ( s = 1 | x ) p ( x , y | s = 1) if all probabilities are non-zero. The distribution of samples in the general population is a weighted version of the distribution of labeled samples. Since p ( s | x ) is learnable, we can estimate weights.

Recommend


More recommend