counterfactual policy evaluation in reproducing kernel
play

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert - PowerPoint PPT Presentation

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max Planck Institute for Intelligent Systems Tbingen, Germany Jeju, Korea February 22, 2019 Krikamol Muandet Counterfactual Learning in RKHS Jeju,


  1. Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max Planck Institute for Intelligent Systems Tübingen, Germany Jeju, Korea — February 22, 2019 Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 1 / 27

  2. Acknowledgment Motonobu Kanagawa Sorawit Saengkyongam Sanparith Marukatat U of Tübingen UCL NECTEC Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 2 / 27

  3. 1 Introduction 2 Counterfactual Mean Embedding 3 Policy Evaluation 4 Discussion Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 3 / 27

  4. Introduction 1 Introduction 2 Counterfactual Mean Embedding 3 Policy Evaluation 4 Discussion Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 4 / 27

  5. Introduction Motivation Motivation Recommendation Autonomous Car Healthcare Goal: Identify the best (causal) policy. Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 5 / 27

  6. Introduction Motivation Motivation Recommendation Autonomous Car Healthcare Goal: Identify the best (causal) policy. Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 5 / 27

  7. Introduction Motivation Personalization Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 6 / 27

  8. Introduction Motivation Healthcare Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 7 / 27

  9. Introduction y x t for x t y Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet 0 The term “context” and “covariate” may be used interchangeably. Y Outcome T Treatmnt Policy Context X . An outcome y Problem Setup . for x t t x A treatment t . A context x cholesterol level . pills , age gender , Ex: A Causal Policy 8 / 27 X : Context , T : Treatment , Y : Outcome , π : Policy

  10. Introduction . Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet 0 The term “context” and “covariate” may be used interchangeably. Y Outcome T Treatmnt Policy Context X y x t for x t y Problem Setup An outcome y . for x t t x A treatment t . A context x A Causal Policy 8 / 27 X : Context , T : Treatment , Y : Outcome , π : Policy Ex: X = { age , gender } , T = pills , Y = cholesterol level .

  11. Introduction Context X Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet 0 The term “context” and “covariate” may be used interchangeably. Y Outcome T Treatmnt Policy . Problem Setup y x t for x t y An outcome y . for x t t x A treatment t A Causal Policy 8 / 27 X : Context , T : Treatment , Y : Outcome , π : Policy Ex: X = { age , gender } , T = pills , Y = cholesterol level . A context x ∼ ρ .

  12. Introduction Policy Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet 0 The term “context” and “covariate” may be used interchangeably. Y Outcome T Treatmnt Context X Problem Setup . y x t for x t y An outcome y A Causal Policy 8 / 27 X : Context , T : Treatment , Y : Outcome , π : Policy Ex: X = { age , gender } , T = pills , Y = cholesterol level . A context x ∼ ρ . A treatment t ∼ π ( t | x ) for ( x , t ) ∈ X × T .

  13. Introduction Treatmnt Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet 0 The term “context” and “covariate” may be used interchangeably. Y Outcome T 8 / 27 Problem Setup Context X A Causal Policy X : Context , T : Treatment , Y : Outcome , π : Policy Ex: X = { age , gender } , T = pills , Y = cholesterol level . A context x ∼ ρ . A treatment t ∼ π ( t | x ) for ( x , t ) ∈ X × T . An outcome y ∼ η ( y | x , t ) for ( x , t , y ) ∈ X × T × Y . Policy π

  14. Introduction Problem Setup How to Identify Good Policies Randomized Exp. (A/B Test) unethical Observational Studies No randomization Cheaper, safer, and more ethical Selection bias Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 9 / 27 � Gold standard in science × Expensive, time-consuming, or

  15. Introduction Problem Setup How to Identify Good Policies Randomized Exp. (A/B Test) unethical Observational Studies Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 9 / 27 � No randomization � Gold standard in science � Cheaper, safer, and more ethical × Expensive, time-consuming, or × Selection bias

  16. Counterfactual Mean Embedding 1 Introduction 2 Counterfactual Mean Embedding 3 Policy Evaluation 4 Discussion Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 10 / 27

  17. Counterfactual Mean Embedding -7 12 -2 C 5 11 -6 D 12 19 Individual treatment efgect: ITE i B Y i Y i Fundamental Problem of Causal Inference (FPCI) (Rubin 2005) Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 10 -5 Potential Outcome Framework cholesterol level if T Potential Outcome Framework Standard framework in social science, econometric, and healthcare. Treatment T and outcome Y Y . T placebo injection Y placebo 20 Y cholesterol level if T injection. Unit Y Y Y Y A 15 11 / 27

  18. Counterfactual Mean Embedding Y 11 -6 D 12 19 -7 Individual treatment efgect: ITE i i C Y i Fundamental Problem of Causal Inference (FPCI) (Rubin 2005) Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 5 -2 Potential Outcome Framework A Potential Outcome Framework Standard framework in social science, econometric, and healthcare. Unit 12 11 / 27 15 20 -5 B 10 Treatment T ∈ { 0 , 1 } and outcome Y 0 , Y 1 ∈ R . ◮ T ∈ { placebo , injection } ◮ Y 0 = cholesterol level if T = placebo ◮ Y 1 = cholesterol level if T = injection. Y 1 − Y 0 Y 1 Y 0

  19. Counterfactual Mean Embedding 12 Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet (Rubin 2005) Fundamental Problem of Causal Inference (FPCI) -7 19 12 D -6 11 5 C -2 10 Potential Outcome Framework B Potential Outcome Framework Standard framework in social science, econometric, and healthcare. Unit A 15 20 -5 11 / 27 Treatment T ∈ { 0 , 1 } and outcome Y 0 , Y 1 ∈ R . ◮ T ∈ { placebo , injection } ◮ Y 0 = cholesterol level if T = placebo ◮ Y 1 = cholesterol level if T = injection. Y 1 − Y 0 Y 1 Y 0 Individual treatment efgect: ITE ( i ) := Y 1 ( i ) − Y 0 ( i )

  20. Counterfactual Mean Embedding 12 Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet (Rubin 2005) Fundamental Problem of Causal Inference (FPCI) ? 19 - D ? - 5 C ? - Potential Outcome Framework B Potential Outcome Framework Standard framework in social science, econometric, and healthcare. Unit A 15 - ? 11 / 27 Treatment T ∈ { 0 , 1 } and outcome Y 0 , Y 1 ∈ R . ◮ T ∈ { placebo , injection } ◮ Y 0 = cholesterol level if T = placebo ◮ Y 1 = cholesterol level if T = injection. Y 1 − Y 0 Y 1 Y 0 Individual treatment efgect: ITE ( i ) := Y 1 ( i ) − Y 0 ( i )

  21. x n t n y n where x i t i y i Counterfactual Mean Embedding y Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet (Rubin 2005) The treatment assignment mechanism is not known. covariate received treatment outcome . y t x t Potential Outcome Framework x We observe a dataset x X T x A propensity score: Confounders ( Z ) afgecting both T and Y simultaneously may exist. Covariates ( X ) associated with each unit are available. Causal efgect is defjned w.r.t. the counterfactual outcomes. Rubin’s Causal Model 12 / 27 ◮ What would the value of Y 1 have been had the subject get the injection?

  22. x n t n y n where x i t i y i Counterfactual Mean Embedding y Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet (Rubin 2005) The treatment assignment mechanism is not known. covariate received treatment outcome . y t x t Potential Outcome Framework x We observe a dataset x X T x A propensity score: Confounders ( Z ) afgecting both T and Y simultaneously may exist. Covariates ( X ) associated with each unit are available. Causal efgect is defjned w.r.t. the counterfactual outcomes. Rubin’s Causal Model 12 / 27 ◮ What would the value of Y 1 have been had the subject get the injection?

  23. x n t n y n where x i t i y i Counterfactual Mean Embedding y Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet (Rubin 2005) The treatment assignment mechanism is not known. covariate received treatment outcome . y t x t Potential Outcome Framework x We observe a dataset x X T x A propensity score: Confounders ( Z ) afgecting both T and Y simultaneously may exist. Covariates ( X ) associated with each unit are available. Causal efgect is defjned w.r.t. the counterfactual outcomes. Rubin’s Causal Model 12 / 27 ◮ What would the value of Y 1 have been had the subject get the injection?

Recommend


More recommend