model selection model selection under covariate shift
play

Model Selection Model Selection under Covariate Shift under - PowerPoint PPT Presentation

Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Klaus-Robert Mller Fraunhofer FIRST, Berlin, Germany University of Potsdam, Potsdam, Germany 2


  1. Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Klaus-Robert Müller Fraunhofer FIRST, Berlin, Germany University of Potsdam, Potsdam, Germany

  2. 2 Standard Regression Problem Standard Regression Problem � Learning target function: � Training examples: � Test input : � Goal: Obtain approximation that minimizes expected error for test inputs (or generalization error)

  3. 3 Training Input Distribution Training Input Distribution � Common assumption: Training input follows the same distribution as test input: � Here, we suppose distributions are different. Covariate shift

  4. 4 Covariate Shift Covariate Shift � Is covariate shift important to investigate? � Yes! It often happens in reality. � Interpolation / extrapolation � Active learning (experimental design) � Classification from imbalanced data

  5. 5 Ordinary Least Squares Ordinary Least Squares under Covariate Shift under Covariate Shift � Asymptotically unbiased if model is correct. � Asymptotically biased for misspecified models. � Need to reduce bias.

  6. 6 Weighted Least Squares Weighted Least Squares for Covariate Shift for Covariate Shift (Shimodaira, 2000) :Assumed known and strictly positive � Asymptotically unbiased for misspecified models. � Can have large variance. � Need to reduce variance.

  7. 7 -Weighted Least Squares -Weighted Least Squares (Shimodaira, 2000) Large bias Small bias (Intermediate) Small variance Large variance should be chosen appropriately! (Model Selection)

  8. 8 Generalization Error Estimation Generalization Error Estimation under Covariate Shift under Covariate Shift � is determined so that (estimated) True generalization error generalization error is minimized. Cross-validation � However, standard methods such as cross-validation is Proposed estimator heavily biased. � Goal: Derive better estimator

  9. 9 Setting Setting � I.i.d. noise with mean 0 and variance � Linear regression model: � -weighted least squares:

  10. 10 Decomposition of Decomposition of Generalization Error Generalization Error Accessible Estimated Constant (ignored) � We estimate

  11. 11 Orthogonal Decomposition of Orthogonal Decomposition of Learning Target Function Learning Target Function :Optimal parameter

  12. 12 Unbiased Estimation of Unbiased Estimation of :Expectation over noise � Suppose we have , which gives linear unbiased estimator of � :Unbiased estimator of noise variance � � Then we have an unbiased estimator of : � But are not always available. Use approximations instead

  13. 13 Approximations of Approximations of � � � If model is correct, � If model is misspecified,

  14. 14 New Generalization Error Estimator New Generalization Error Estimator Bias : � If model is correct, � If model is almost correct, � If model is misspecified,

  15. 15 Simulation (Toy) Simulation (Toy)

  16. 16 Results Results True generalization error 10-fold cross-validation Proposed estimator

  17. 17 Simulation (Abalone from DELVE) Simulation (Abalone from DELVE) � Estimate the age of abalones from 7 physical measurements. � We add bias to 4 th attribute (weight of abalones) � Training and test input densities are estimated by standard kernel density estimator. �

  18. 18 Generalization Error Estimation Generalization Error Estimation Mean over 300 trials True gen error 10CV Proposed

  19. 19 Test Error After Model Selection Test Error After Model Selection Extrapolation in 4 th attribute n 50 200 800 9.86 ± 4.27 7.40 ± 1.77 6.54 ± 1.34 OPT 11.67 ± 5.74 7.95 ± 2.15 6.77 ± 1.40 Proposed 10.88 ± 5.05 8.06 ± 1.91 7.24 ± 1.37 10CV T-test (5%) Extrapolation in 6 th attribute n 50 200 800 9.04 ± 4.04 6.76 ± 1.68 6.05 ± 1.25 OPT 10.67 ± 6.19 7.31 ± 2.24 6.20 ± 1.33 Proposed 10.15 ± 4.95 7.42 ± 1.81 6.68 ± 1.25 10CV

  20. 20 Conclusions Conclusions � Covariate shift: Training and test input distributions are different � Ordinary LS: Biased � Weighted LS: Unbiased but large variance. � -WLS: Model selection needed. � Cross-validation: Biased � Proposed generalization error estimator: � Exactly unbiased (correct models) � Asymptotically unbiased (misspecified models)

Recommend


More recommend