PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain - PowerPoint PPT Presentation

PRLab TUDelft NL

LEARNING UNDER COVARIATE SHIFT Domain Adaptation, Transfer Learning, Data Shift, Concept Drift… Marco Loog Pattern Recognition Laboratory Delft University of Technology PRLab TUDelft NL

PRLab TUDelft NL

Graphically Speaking � Covariate shift P(S=1|X,Y) = P(S=1|X) � So change of priors is not covariate shift… P(S=1|X,Y) = P(S=1|Y) PRLab TUDelft NL

The Canonical Example � How much does it help, really, when hypothesis considered are very nonparametric? PRLab TUDelft NL

Importance Weighting : Basic Idea � Expected risk on test : ∫∫ L(x,y|θ) P(x,y) dx dy � Rewrite : ∫∫ L(x,y|θ) P(x)/Q(x) Q(x,y) dx dy � Empirical loss [on training] : ∑ L(x i ,y i |θ) P(x i )/Q(x i ) � Importance weights : P(x i )/Q(x i ) PRLab TUDelft NL

Estimation of Importance : E.g. � Estimate P(x) and Q(x) [normal distributions, Parzen densities, whatever] and calculate weights through w = P/Q � Sugiyama suggests to estimate weights directly � Find w such that KL(Q||w P) is minimal [KLIEP] � Q and P are modelled by Parzen densities � More well-founded suggestions have been given by Huang, Smola, Cortes, Mohri, Mansour, et al. � Yet another approach is based on a very simple [Laplace smoothed] nearest neighbor estimate PRLab TUDelft NL

Again! A Shameless Plug… � But only a short one this time… � Nearest neighbor weighting [NNeW] � The idea… PRLab TUDelft NL

P “Optimal” Weights Q � Linear regression example � Find the coefficient θ that relates y to x via y = θ x + ɛ Q � Optimal θ = 1 � Squared loss � Assume one knows the true P(X) and Q(X) � For particular weighting, solution P can be found by means of weighted regression PRLab TUDelft NL

Learning Curve for “Optimal” Weights � Using the true weights Q/P, what behavior do we expect for increasing sample sizes? � Let us consider relative improvements : MSE(Q)/MSE(P) � 1 training sample? � Many [say ∞] training samples? � And in between? PRLab TUDelft NL

As a Side Remark � Can we solve semi-supervised learning by importance weighting? � [Earlier references to Sokolovska and Kawakita] PRLab TUDelft NL

[Further] Questions, Remarks, etc. � What problems can be modelled as covariate shift? � What if P(S=1|X,Y) cannot be simplified? � Bickel et al. take Sugiyama et al. a step further and discrepancy minimization makes yet another step � Weighted version can deteriorate even if “true” weights are used � Correction by weighting might have hardly any influence when nonparametric hypothesis considered � When to use weighting in the first place? PRLab TUDelft NL

References - Ben-David, Blitzer, Crammer, Kulesza, Pereira, Vaughan, “A theory of learning from different domains,” ML, 2010 - Ben-David, Lu, Pál, “Impossibility theorems for domain adaptation,” AISTATS, 2010 - Ben-David, Urner, “On the hardness of domain adaptation and the utility of unlabeled target samples,” ALT, 2012 - Bickel, Brückner, “Scheffer, Discriminative learning under covariate shift”, JMLR, 2009 - Cortes, Mohri, “Domain adaptation and sample bias correction theory and algorithm for regression,” Theoretical CS, 2014 - Daumé III, “Frustratingly easy domain adaptation,” ACL, 2009 - Dinh, Duin, Piqueras-Salazar, Loog, “FIDOS: A generalized Fisher based feature extraction method for domain shift,” PR, 2013 - Gama, Zliobaite, Bifet, Pechenizkiy, Bouchachia, “A survey on concept drift adaptation,” ACM CSUR, 2014 - Jiang, “A literature survey on domain adaptation of statistical classifiers,” 2008 - Loog, “Nearest neighbor-based importance weighting,” MLSP, 2012 - Lu, Behbood, Hao, Zuo, Xue, Zhang, “Transfer Learning using Computational Intelligence: A Survey,” KBS, 2015 - Mansour, Mohri, Rostamizadeh, “Domain adaptation: Learning bounds and algorithms,” COLT, 2009 - Margolis, “A literature review of domain adaptation with unlabeled data,” University of Washington, TR 35, 2010 - Pan, Tsang, Kwok, Yang, “Domain adaptation via transfer component analysis,” IEEE TNN, 2011 - Pan, Yang, “A survey on transfer learning,”, IEEE TKDE, 2010 - Quionero-Candela, Sugiyama, Schwaighofer, Lawrence, “Dataset shift in machine learning,” The MIT Press, 2009 - Shimodaira, “Improving predictive inference under covariate shift by weighting the log-likelihood function,” J. Stat. Plan. Inference, 2000 - Sugiyama, Krauledat, & Müller, “Covariate shift adaptation by importance weighted cross validation,” JMLR, 2007 - Torrey, Shavlik, “Transfer learning,” Handbook of Research on ML Applications and Trends, 2009 PRLab TUDelft NL

PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain - PowerPoint PPT Presentation

PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain Adaptation, Transfer Learning, Data Shift, Concept Drift Marco Loog Pattern Recognition Laboratory Delft University of Technology PRLab TUDelft NL PRLab TUDelft NL PRLab

PRLab TUDelft NL PATTERN RECOGNITION & MACHINE LEARNING An Introduction Marco Loog

Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? Multi-Set Learning? Marco Loog

Covariate Adjustment and Statistical Power Tara Slough EGAP Learning Days X Covariate Adjustment

Making the Shift K-9 Communicating Student Learning Why Shift? Click to add text Where are

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara

An Evolutionary View on Reversible Shift-invariant Transformations Luca Mariot, Stjepan Picek,

Treatment choice with many covariate values Aleksey Tetenov (University of Bristol) Cemmap

HOLY SHIFT! Linda Zheng Roadmap You are here My Shift Introduction Shift AST Experience

1 Mean Shift Algorithm Graph-Theoretic Image Segmentation Mean Shift Algorithm 1. Choose a

Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton

Covariate Balancing Propensity Score Kosuke Imai Princeton University Winter Conference in

Sensitivity of the population size estimates for different imputations of a covariate B.F.M.

Interactive Disambiguation of Meta Programs with Concrete Object Syntax Lennart Kats (TUDelft)

The New Era of Cyber Threats The Shift to Self-Learning, Self-Defending Networks Georgiana

1 2 nd Shift Associates 2 nd Shift Associates 3 rd Shift Associates 3 rd Shift Associates 2

Machine Learning for Healthcare HST.956, 6.S897 Lecture 24: Robustness to dataset shift David

The resurrection of time as a Time is a covariate determinant of rates continuous concept

Estimating Treatment Effects in Cluster Randomized Trials by Calibrating Covariate Imbalances

Guiding Principles System Shift: Racial Equity System Shift: A Connected & System Shift: A

Secondary Motivation #GOInnovate17 A Shift in Thinking A Shift in Perspective What do I need

THE LARGEST SHIFT IN THE WAY WE LIVE - EVER AT THE SAME TIME BIRTHRATES Demographic shift. It is a

Shift Work and the Impact on Wellbeing Helen Lawson Objectives Shift work in context &

PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain - PowerPoint PPT Presentation

PRLab TUDelft NL LEARNING UNDER COVARIATE SHIFT Domain Adaptation, Transfer Learning, Data Shift, Concept Drift Marco Loog Pattern Recognition Laboratory Delft University of Technology PRLab TUDelft NL PRLab TUDelft NL PRLab

PRLab TUDelft NL PATTERN RECOGNITION &amp; MACHINE LEARNING An Introduction Marco Loog

Model Selection Model Selection under Covariate Shift under Covariate Shift Masashi Sugiyama

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Classify as a Whole? MULTIPLE INSTANCE LEARNING Set Learning? Multi-Set Learning? Marco Loog

Covariate Adjustment and Statistical Power Tara Slough EGAP Learning Days X Covariate Adjustment

Making the Shift K-9 Communicating Student Learning Why Shift? Click to add text Where are

Strong Baselines for Neural Semi-supervised Learning under Domain Shift Sebastian Ruder Barbara

An Evolutionary View on Reversible Shift-invariant Transformations Luca Mariot, Stjepan Picek,

Treatment choice with many covariate values Aleksey Tetenov (University of Bristol) Cemmap

HOLY SHIFT! Linda Zheng Roadmap You are here My Shift Introduction Shift AST Experience

1 Mean Shift Algorithm Graph-Theoretic Image Segmentation Mean Shift Algorithm 1. Choose a

Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton

Covariate Balancing Propensity Score Kosuke Imai Princeton University Winter Conference in

Sensitivity of the population size estimates for different imputations of a covariate B.F.M.

Interactive Disambiguation of Meta Programs with Concrete Object Syntax Lennart Kats (TUDelft)

The New Era of Cyber Threats The Shift to Self-Learning, Self-Defending Networks Georgiana

1 2 nd Shift Associates 2 nd Shift Associates 3 rd Shift Associates 3 rd Shift Associates 2

Machine Learning for Healthcare HST.956, 6.S897 Lecture 24: Robustness to dataset shift David

The resurrection of time as a Time is a covariate determinant of rates continuous concept

Estimating Treatment Effects in Cluster Randomized Trials by Calibrating Covariate Imbalances

Guiding Principles System Shift: Racial Equity System Shift: A Connected &amp; System Shift: A

Secondary Motivation #GOInnovate17 A Shift in Thinking A Shift in Perspective What do I need

THE LARGEST SHIFT IN THE WAY WE LIVE - EVER AT THE SAME TIME BIRTHRATES Demographic shift. It is a

Shift Work and the Impact on Wellbeing Helen Lawson Objectives Shift work in context &amp;

PRLab TUDelft NL PATTERN RECOGNITION & MACHINE LEARNING An Introduction Marco Loog

Guiding Principles System Shift: Racial Equity System Shift: A Connected & System Shift: A

Shift Work and the Impact on Wellbeing Helen Lawson Objectives Shift work in context &