Importance-Weighted Cross- Importance-Weighted Cross- Validation - PowerPoint PPT Presentation

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation for Covariate Shift (1) (2) Masashi Sugiyama , Benjamin Blankertz , (2,3) (2) Matthias Krauledat , Guido Dornhege , (3,2) Klaus-Robert Müller (1) Tokyo Institute of Technology, Tokyo, Japan (2) Fraunhofer FIRST.IDA, Berlin, Germany (3) Technical University Berlin, Berlin, Germany

2 Common Assumption Common Assumption in Supervised Learning in Supervised Learning � Goal: from given training samples, predict output of unseen test samples � To do so, we always assume Training and test samples are drawn from the same distribution � Is this assumption really true?

3 Not Always True! Not Always True! � Less women in face dataset than reality. � More criticisms in survey sampling than reality. � Tend to collect easy-to-gather samples for training. � Sample generation mechanism varies over time. Brain activity data The Yale Face Database B

4 Covariate Shift Covariate Shift � However, no chance for generalization if training and test samples have nothing in common. � Covariate shift: � Input distribution changes � Functional relation remains unchanged

5 Examples of Covariate Shift Examples of Covariate Shift (Weak) extrapolation: Predict output values outside training region Training samples Test samples

6 Examples (cont.) Examples (cont.) � Possible applications: � Non-stationarity compensation in brain- computer interface � Online system adaptation in robot motor control � Correcting sample selection bias in survey sampling � Active learning (experimental design) Sugiyama (JMLR2006)

7 Covariate Shift Covariate Shift � To illustrate the effect of covariate shift, let’s focus on linear extrapolation Training samples Test samples True function Learned function

8 Ordinary Least-Squares Ordinary Least-Squares � If model is correct: OLS minimizes bias asymptotically � If model is misspecified: OLS does not minimize bias even asymptotically. We don’t have correct model in practice, so we need to reduce bias!

9 Law of Large Numbers Law of Large Numbers � Sample average converges to the population mean: � We want to estimate the expectation over test input points only using training input points .

10 Key Trick: Key Trick: Importance-Weighted Average Importance-Weighted Average � Importance ： Ratio of test and training input densities � Importance-weighted average: (cf. importance sampling)

11 Importance-Weighted LS Importance-Weighted LS :Assumed known and strictly positive � Even for misspedified models, IWLS minimizes bias asymptotically.

12 Importance-Weighted LS (cont.) Importance-Weighted LS (cont.) � However, variance of IWLS is larger than OLS (cf. BLUE) We want to reduce variance We reduce variance by adding small bias to IWLS (e.g., changing weight, regularization)

13 Adaptive IWLS Adaptive IWLS (Shimodaira, 2000) Large bias Small bias Intermediate Small variance Large variance

14 Model Selection Model Selection � We want to determine so that generalization error (bias+var) is minimized. � However, gen. error is inaccessible. � We use a gen. error estimator instead.

15 Cross-Validation Cross-Validation � A standard method for gen. error estimation � Divide training samples into groups. � Train a learning machine with groups. � Validate the trained machine using the rest. � Repeat this for all combinations and output the mean validation error. Group 1 Group 2 … Group k-1 Group k Training Validation

16 CV under Covariate Shift CV under Covariate Shift 0.45 0.4 True gen. error � CV is almost unbiased 0.35 0.3 without covariate shift. 0.25 0.2 � However, it is heavily 0.15 0.1 biased under covariate 0.05 0 0.2 0.4 0.6 0.8 1 shift. Cross validation 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1

17 Goal of This Talk Goal of This Talk � We propose a better generalization error estimator under covariate shift!

18 Importance-Weighted CV (IWCV) Importance-Weighted CV (IWCV) � When testing the classifier in CV process, we also importance-weight the test error. Set 1 Set 2 Set k-1 Set k … Training Testing IWCV gives almost unbiased estimates of gen. error even under covariate shift

19 Example of IWCV Example of IWCV True gen. error 0.4 Obtained 0.3 generalization error 0.2 0.1 0 0.2 0.4 0.6 0.8 1 Ordinary CV 0.356(0.086) Ordinary CV 1.4 IWCV 0.077(0.020) 1.2 1 0.8 Mean(Std.) 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 � IWCV is nicely unbiased IWCV 0.4 � Model selection by IWCV 0.3 0.2 outperforms CV! 0.1 0 0.2 0.4 0.6 0.8 1

20 Relation to Existing Methods Relation to Existing Methods IWAIC (Shimodaira, JSPI 2000) IWSIC (Sugiyama & Müller, Stat. & Deci. 2005) IWAIC IWSIC IWCV Asymptotic Finite Unbiasedness Asymptotic & Finite sample Loss Smooth Squared Arbitrary Model Regular Linear Arbitrary Parameter Smooth Linear Arbitrary learning Computation Fast Fast Slow IWCV is the first method that is applicable to classification with covariate shift!

21 Application: Application: Brain-Computer Interface Brain-Computer Interface Brain activity in different mental states is transformed into control signals

22 Non-Stationarity in EEG Features Non-Stationarity in EEG Features � Different mental conditions (attention, sleepiness etc.) between training and test phases may change the EEG signals. Bandpower differences between Features extracted from brain activity training and test phases during training and test phases

23 Adaptive Importance-Weighted Adaptive Importance-Weighted Linear Discriminant Analysis Linear Discriminant Analysis � Standard classification method in BCI: LDA (after appropriate feature extraction) � We use its variant: AIWLDA � : Ordinary LDA (standard method) � : IWLDA (consistent) � is tuned by proposed IWCV

24 BCI Results BCI Results Sub- Ordinary AIWLDA Trial ject LDA +10IWCV � Proposed method 1 9.3 % 10.0 % outperforms existing 2 8.8 % 8.8 % 1 3 4.3 % 4.3 % one in 5 cases! 1 40.0 % 40.0 % 2 2 39.3 % 38.7 % 3 25.5 % 25.5 % 1 36.9 % 34.4 % 3 2 21.3 % 19.3 % 3 22.5 % 17.5 % 1 21.3 % 21.3 % 4 2 2.4 % 2.4 % 3 6.4 % 6.4 % 1 21.3 % 21.3 % 5 2 15.3 % 14.0 %

25 BCI Results BCI Results KL divergence from training Sub- Ordinary AIWLDA Trial KL ject LDA +10IWCV to test input distributions 1 9.3 % 10.0 % 0.76 � When KL is large, 2 8.8 % 8.8 % 1.11 1 3 4.3 % 4.3 % 0.69 IWCV is better. 1 40.0 % 40.0 % 0.97 � When KL is small, 2 2 39.3 % 38.7 % 1.05 3 25.5 % 25.5 % 0.43 no difference. 1 36.9 % 34.4 % 2.63 � Non-stationarity in 3 2 21.3 % 19.3 % 2.88 3 22.5 % 17.5 % 1.25 EEG could be 1 21.3 % 21.3 % 9.23 successfully 4 2 2.4 % 2.4 % 5.58 modeled by 3 6.4 % 6.4 % 1.83 1 21.3 % 21.3 % 0.79 covariate shift! 5 2 15.3 % 14.0 % 2.01

26 Conclusions Conclusions � Covariate shift: input distribution varies but functional relation remains unchanged. � Importance weight plays a central role in compensating covariate shift. � IW cross-validation: unbiased and general � IWCV improves the performance of BCI. � Class-prior change: a variant of IWCV works � Latent distribution shift: Storkey & Sugiyama (to be presented at NIPS2006)

Importance-Weighted Cross- Importance-Weighted Cross- Validation - PowerPoint PPT Presentation

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation for Covariate Shift (1) (2) Masashi Sugiyama , Benjamin Blankertz , (2,3) (2) Matthias Krauledat , Guido Dornhege , (3,2) Klaus-Robert

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Heuristic search Weighted A Kustaa Kangas October 17, 2013 K. Kangas () Heuristic search

WEIGHTED ORLICZ ALGEBRAS Serap OZTOP Istanbul University ( This is joint work with Alen

The Importance of The Importance of The Importance of The Importance of Mechanical Insulation

Hierarchical Importance Weighted Autoencoders Chin-Wei Huang Kris Sankaran Eeshan Dhekane

Cross Ram Support Set Ram accessories 1 Cross Ram Support Set Set composition The Cross

SWeRF Size-Weighted Respirable Fraction Size-Weighted Respirable Fraction in bulk products A

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets

Rigorous Approximated Determinization of Weighted Automata Benjamin Aminof (Hebrew University)

SY 21 Charles Sumner School January 8, 2020 What is Weighted Student Funding? Weighted Student

Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis June 27, 2017,

Sampling Distributions and Inference Department of Mathematics & Statistics Memorial

Gov 2000: 8. Simple Linear Regression Matthew Blackwell Fall 2016 1 / 84 1. Assumptions of the

Structure-Aware Sampling: Flexible and Accurate Summarization Edith Cohen, Graham Cormode, Nick

WHO WE ARE The Coali liti tion on is is a div iverse se group up of commun munity ty

Verses in For those God foreknew, he also predestined to be conformed to the likeness of his

FROM OVERWHELMING FEAR TO PERFECT PEACE Isaiah 26:3 ESV You keep him in perfect peace whose mind

Acts Series Lesson #122 September 3, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert

Importance-Weighted Cross- Importance-Weighted Cross- Validation - PowerPoint PPT Presentation

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation for Covariate Shift (1) (2) Masashi Sugiyama , Benjamin Blankertz , (2,3) (2) Matthias Krauledat , Guido Dornhege , (3,2) Klaus-Robert

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

11.4 The Pricing Method: Vertex Cover Weighted Vertex Cover Weighted vertex cover. Given a

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Heuristic search Weighted A Kustaa Kangas October 17, 2013 K. Kangas () Heuristic search

WEIGHTED ORLICZ ALGEBRAS Serap OZTOP Istanbul University ( This is joint work with Alen

The Importance of The Importance of The Importance of The Importance of Mechanical Insulation

Hierarchical Importance Weighted Autoencoders Chin-Wei Huang Kris Sankaran Eeshan Dhekane

Cross Ram Support Set Ram accessories 1 Cross Ram Support Set Set composition The Cross

SWeRF Size-Weighted Respirable Fraction Size-Weighted Respirable Fraction in bulk products A

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets

Rigorous Approximated Determinization of Weighted Automata Benjamin Aminof (Hebrew University)

SY 21 Charles Sumner School January 8, 2020 What is Weighted Student Funding? Weighted Student

Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis June 27, 2017,

Sampling Distributions and Inference Department of Mathematics &amp; Statistics Memorial

Gov 2000: 8. Simple Linear Regression Matthew Blackwell Fall 2016 1 / 84 1. Assumptions of the

Structure-Aware Sampling: Flexible and Accurate Summarization Edith Cohen, Graham Cormode, Nick

WHO WE ARE The Coali liti tion on is is a div iverse se group up of commun munity ty

Verses in For those God foreknew, he also predestined to be conformed to the likeness of his

FROM OVERWHELMING FEAR TO PERFECT PEACE Isaiah 26:3 ESV You keep him in perfect peace whose mind

Acts Series Lesson #122 September 3, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert

Sampling Distributions and Inference Department of Mathematics & Statistics Memorial