prlab tudelft nl learning under covariate shift
play

PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain - PowerPoint PPT Presentation

PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain Adaptation, Transfer Learning, Data Shift, Concept Drift Marco Loog Pattern Recognition Laboratory Delft University of Technology PRLab TUDelft NL PRLab TUDelft NL PRLab


  1. PRLab TUDelft NL

  2. LEARNING UNDER COVARIATE SHIFT Domain Adaptation, Transfer Learning, Data Shift, Concept Drift… Marco Loog Pattern Recognition Laboratory Delft University of Technology PRLab TUDelft NL

  3. PRLab TUDelft NL

  4. PRLab TUDelft NL

  5. Covariate Shift Assumption � Covariate shift via posterior or via label function � P(Y|X) = Q(Y|X) vs. ℓ(X|P) = ℓ(X|Q) = ℓ(X) � Equal to assumption of missing at random � P(S=1|X,Y) = P(S=1|X) � Standard setting : P(S=1|X,Y) = P(S=1) PRLab TUDelft NL

  6. Graphically Speaking � Covariate shift P(S=1|X,Y) = P(S=1|X) � So change of priors is not covariate shift… P(S=1|X,Y) = P(S=1|Y) PRLab TUDelft NL

  7. The Canonical Example � How much does it help, really, when hypothesis considered are very nonparametric? PRLab TUDelft NL

  8. Importance Weighting : Basic Idea � Expected risk on test : ∫∫ L(x,y|θ) P(x,y) dx dy � Rewrite : ∫∫ L(x,y|θ) P(x)/Q(x) Q(x,y) dx dy � Empirical loss [on training] : ∑ L(x i ,y i |θ) P(x i )/Q(x i ) � Importance weights : P(x i )/Q(x i ) PRLab TUDelft NL

  9. Estimation of Importance : E.g. � Estimate P(x) and Q(x) [normal distributions, Parzen densities, whatever] and calculate weights through w = P/Q � Sugiyama suggests to estimate weights directly � Find w such that KL(Q||w P) is minimal [KLIEP] � Q and P are modelled by Parzen densities � More well-founded suggestions have been given by Huang, Smola, Cortes, Mohri, Mansour, et al. � Yet another approach is based on a very simple [Laplace smoothed] nearest neighbor estimate PRLab TUDelft NL

  10. Again! A Shameless Plug… � But only a short one this time… � Nearest neighbor weighting [NNeW] � The idea… PRLab TUDelft NL

  11. P “Optimal” Weights Q � Linear regression example � Find the coefficient θ that relates y to x via y = θ x + ɛ Q � Optimal θ = 1 � Squared loss � Assume one knows the true P(X) and Q(X) � For particular weighting, solution P can be found by means of weighted regression PRLab TUDelft NL

  12. Learning Curve for “Optimal” Weights � Using the true weights Q/P, what behavior do we expect for increasing sample sizes? � Let us consider relative improvements : MSE(Q)/MSE(P) � 1 training sample? � Many [say ∞] training samples? � And in between? PRLab TUDelft NL

  13. As a Side Remark � Can we solve semi-supervised learning by importance weighting? � [Earlier references to Sokolovska and Kawakita] PRLab TUDelft NL

  14. [Further] Questions, Remarks, etc. � What problems can be modelled as covariate shift? � What if P(S=1|X,Y) cannot be simplified? � Bickel et al. take Sugiyama et al. a step further and discrepancy minimization makes yet another step � Weighted version can deteriorate even if “true” weights are used � Correction by weighting might have hardly any influence when nonparametric hypothesis considered � When to use weighting in the first place? PRLab TUDelft NL

  15. References - Ben-David, Blitzer, Crammer, Kulesza, Pereira, Vaughan, “A theory of learning from different domains,” ML, 2010 - Ben-David, Lu, Pál, “Impossibility theorems for domain adaptation,” AISTATS, 2010 - Ben-David, Urner, “On the hardness of domain adaptation and the utility of unlabeled target samples,” ALT, 2012 - Bickel, Brückner, “Scheffer, Discriminative learning under covariate shift”, JMLR, 2009 - Cortes, Mohri, “Domain adaptation and sample bias correction theory and algorithm for regression,” Theoretical CS, 2014 - Daumé III, “Frustratingly easy domain adaptation,” ACL, 2009 - Dinh, Duin, Piqueras-Salazar, Loog, “FIDOS: A generalized Fisher based feature extraction method for domain shift,” PR, 2013 - Gama, Zliobaite, Bifet, Pechenizkiy, Bouchachia, “A survey on concept drift adaptation,” ACM CSUR, 2014 - Jiang, “A literature survey on domain adaptation of statistical classifiers,” 2008 - Loog, “Nearest neighbor-based importance weighting,” MLSP, 2012 - Lu, Behbood, Hao, Zuo, Xue, Zhang, “Transfer Learning using Computational Intelligence: A Survey,” KBS, 2015 - Mansour, Mohri, Rostamizadeh, “Domain adaptation: Learning bounds and algorithms,” COLT, 2009 - Margolis, “A literature review of domain adaptation with unlabeled data,” University of Washington, TR 35, 2010 - Pan, Tsang, Kwok, Yang, “Domain adaptation via transfer component analysis,” IEEE TNN, 2011 - Pan, Yang, “A survey on transfer learning,”, IEEE TKDE, 2010 - Quionero-Candela, Sugiyama, Schwaighofer, Lawrence, “Dataset shift in machine learning,” The MIT Press, 2009 - Shimodaira, “Improving predictive inference under covariate shift by weighting the log-likelihood function,” J. Stat. Plan. Inference, 2000 - Sugiyama, Krauledat, & Müller, “Covariate shift adaptation by importance weighted cross validation,” JMLR, 2007 - Torrey, Shavlik, “Transfer learning,” Handbook of Research on ML Applications and Trends, 2009 PRLab TUDelft NL

Recommend


More recommend