Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max Planck Institute for Intelligent Systems Tübingen, Germany Jeju, Korea — February 22, 2019 Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 1 / 27
Acknowledgment Motonobu Kanagawa Sorawit Saengkyongam Sanparith Marukatat U of Tübingen UCL NECTEC Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 2 / 27
1 Introduction 2 Counterfactual Mean Embedding 3 Policy Evaluation 4 Discussion Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 3 / 27
Introduction 1 Introduction 2 Counterfactual Mean Embedding 3 Policy Evaluation 4 Discussion Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 4 / 27
Introduction Motivation Motivation Recommendation Autonomous Car Healthcare Goal: Identify the best (causal) policy. Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 5 / 27
Introduction Motivation Motivation Recommendation Autonomous Car Healthcare Goal: Identify the best (causal) policy. Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 5 / 27
Introduction Motivation Personalization Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 6 / 27
Introduction Motivation Healthcare Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 7 / 27
Introduction y x t for x t y Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet 0 The term “context” and “covariate” may be used interchangeably. Y Outcome T Treatmnt Policy Context X . An outcome y Problem Setup . for x t t x A treatment t . A context x cholesterol level . pills , age gender , Ex: A Causal Policy 8 / 27 X : Context , T : Treatment , Y : Outcome , π : Policy
Introduction . Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet 0 The term “context” and “covariate” may be used interchangeably. Y Outcome T Treatmnt Policy Context X y x t for x t y Problem Setup An outcome y . for x t t x A treatment t . A context x A Causal Policy 8 / 27 X : Context , T : Treatment , Y : Outcome , π : Policy Ex: X = { age , gender } , T = pills , Y = cholesterol level .
Introduction Context X Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet 0 The term “context” and “covariate” may be used interchangeably. Y Outcome T Treatmnt Policy . Problem Setup y x t for x t y An outcome y . for x t t x A treatment t A Causal Policy 8 / 27 X : Context , T : Treatment , Y : Outcome , π : Policy Ex: X = { age , gender } , T = pills , Y = cholesterol level . A context x ∼ ρ .
Introduction Policy Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet 0 The term “context” and “covariate” may be used interchangeably. Y Outcome T Treatmnt Context X Problem Setup . y x t for x t y An outcome y A Causal Policy 8 / 27 X : Context , T : Treatment , Y : Outcome , π : Policy Ex: X = { age , gender } , T = pills , Y = cholesterol level . A context x ∼ ρ . A treatment t ∼ π ( t | x ) for ( x , t ) ∈ X × T .
Introduction Treatmnt Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet 0 The term “context” and “covariate” may be used interchangeably. Y Outcome T 8 / 27 Problem Setup Context X A Causal Policy X : Context , T : Treatment , Y : Outcome , π : Policy Ex: X = { age , gender } , T = pills , Y = cholesterol level . A context x ∼ ρ . A treatment t ∼ π ( t | x ) for ( x , t ) ∈ X × T . An outcome y ∼ η ( y | x , t ) for ( x , t , y ) ∈ X × T × Y . Policy π
Introduction Problem Setup How to Identify Good Policies Randomized Exp. (A/B Test) unethical Observational Studies No randomization Cheaper, safer, and more ethical Selection bias Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 9 / 27 � Gold standard in science × Expensive, time-consuming, or
Introduction Problem Setup How to Identify Good Policies Randomized Exp. (A/B Test) unethical Observational Studies Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 9 / 27 � No randomization � Gold standard in science � Cheaper, safer, and more ethical × Expensive, time-consuming, or × Selection bias
Counterfactual Mean Embedding 1 Introduction 2 Counterfactual Mean Embedding 3 Policy Evaluation 4 Discussion Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 10 / 27
Counterfactual Mean Embedding -7 12 -2 C 5 11 -6 D 12 19 Individual treatment efgect: ITE i B Y i Y i Fundamental Problem of Causal Inference (FPCI) (Rubin 2005) Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 10 -5 Potential Outcome Framework cholesterol level if T Potential Outcome Framework Standard framework in social science, econometric, and healthcare. Treatment T and outcome Y Y . T placebo injection Y placebo 20 Y cholesterol level if T injection. Unit Y Y Y Y A 15 11 / 27
Counterfactual Mean Embedding Y 11 -6 D 12 19 -7 Individual treatment efgect: ITE i i C Y i Fundamental Problem of Causal Inference (FPCI) (Rubin 2005) Krikamol Muandet Counterfactual Learning in RKHS Jeju, Korea — February 22, 2019 5 -2 Potential Outcome Framework A Potential Outcome Framework Standard framework in social science, econometric, and healthcare. Unit 12 11 / 27 15 20 -5 B 10 Treatment T ∈ { 0 , 1 } and outcome Y 0 , Y 1 ∈ R . ◮ T ∈ { placebo , injection } ◮ Y 0 = cholesterol level if T = placebo ◮ Y 1 = cholesterol level if T = injection. Y 1 − Y 0 Y 1 Y 0
Counterfactual Mean Embedding 12 Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet (Rubin 2005) Fundamental Problem of Causal Inference (FPCI) -7 19 12 D -6 11 5 C -2 10 Potential Outcome Framework B Potential Outcome Framework Standard framework in social science, econometric, and healthcare. Unit A 15 20 -5 11 / 27 Treatment T ∈ { 0 , 1 } and outcome Y 0 , Y 1 ∈ R . ◮ T ∈ { placebo , injection } ◮ Y 0 = cholesterol level if T = placebo ◮ Y 1 = cholesterol level if T = injection. Y 1 − Y 0 Y 1 Y 0 Individual treatment efgect: ITE ( i ) := Y 1 ( i ) − Y 0 ( i )
Counterfactual Mean Embedding 12 Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet (Rubin 2005) Fundamental Problem of Causal Inference (FPCI) ? 19 - D ? - 5 C ? - Potential Outcome Framework B Potential Outcome Framework Standard framework in social science, econometric, and healthcare. Unit A 15 - ? 11 / 27 Treatment T ∈ { 0 , 1 } and outcome Y 0 , Y 1 ∈ R . ◮ T ∈ { placebo , injection } ◮ Y 0 = cholesterol level if T = placebo ◮ Y 1 = cholesterol level if T = injection. Y 1 − Y 0 Y 1 Y 0 Individual treatment efgect: ITE ( i ) := Y 1 ( i ) − Y 0 ( i )
x n t n y n where x i t i y i Counterfactual Mean Embedding y Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet (Rubin 2005) The treatment assignment mechanism is not known. covariate received treatment outcome . y t x t Potential Outcome Framework x We observe a dataset x X T x A propensity score: Confounders ( Z ) afgecting both T and Y simultaneously may exist. Covariates ( X ) associated with each unit are available. Causal efgect is defjned w.r.t. the counterfactual outcomes. Rubin’s Causal Model 12 / 27 ◮ What would the value of Y 1 have been had the subject get the injection?
x n t n y n where x i t i y i Counterfactual Mean Embedding y Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet (Rubin 2005) The treatment assignment mechanism is not known. covariate received treatment outcome . y t x t Potential Outcome Framework x We observe a dataset x X T x A propensity score: Confounders ( Z ) afgecting both T and Y simultaneously may exist. Covariates ( X ) associated with each unit are available. Causal efgect is defjned w.r.t. the counterfactual outcomes. Rubin’s Causal Model 12 / 27 ◮ What would the value of Y 1 have been had the subject get the injection?
x n t n y n where x i t i y i Counterfactual Mean Embedding y Jeju, Korea — February 22, 2019 Counterfactual Learning in RKHS Krikamol Muandet (Rubin 2005) The treatment assignment mechanism is not known. covariate received treatment outcome . y t x t Potential Outcome Framework x We observe a dataset x X T x A propensity score: Confounders ( Z ) afgecting both T and Y simultaneously may exist. Covariates ( X ) associated with each unit are available. Causal efgect is defjned w.r.t. the counterfactual outcomes. Rubin’s Causal Model 12 / 27 ◮ What would the value of Y 1 have been had the subject get the injection?
Recommend
More recommend