Data-efficient causal effect estimation Adith Swaminathan adswamin@microsoft.com Joint work with Maggie Makar (MIT) and Emre Kıcıman (MSR AI) Brown TRIPODS 1.16.2019
1. Improve ML applications using Causal Reasoning Causal ML Reasoning 2. Use ML tools to perform Causal Inference 2
1. Causal Reasoning -> ML “Use logs collected from interactive systems to evaluate and train new interaction policies” ML The data we collect from …confounds our interactive systems… ML models Simple pragmatic fixes to address confounding! 3
Example: Search [https://arxiv.org/abs/1608.04468 ; WSDM’17] Model the propensity of clicks on documents to de-bias training set of learning-to-rank models Click 4
Pointers to recent results “IPW fixes collaborative filtering for “Similar IPW -like ideas massively recommendations” improve learning-to- rank for search” [Schnabel et al,ICML’16] [Joachims et al,WSDM’17 Best Paper] “Important to reason about variance of IPW for counterfactual learning” [Swaminathan&Joachims,ICML’15] “We can do much better than IPW for structured treatments (slates)” “Self -normalized estimators are [Swaminathan et al,NIPS’17] better to use in these applications” [Swaminathan&Joachims,NIPS’15] “These techniques complement deep learning” [Joachims et al,ICLR’18] 5
1. Improve ML applications using Causal Reasoning Causal ML Reasoning 2. Use ML tools to perform Causal Inference 6
2. ML -> Causal Reasoning “Data efficient treatment effect estimation” [AAAI’19] Representation learning + Causal inference = Bias-Variance Trade-off? 7
Problem Setting Will my patient’s blood pressure increase if I put her on medication A? Challenges - A question of causal nature - Limited data at test time 8
Individual Treatment Effect (ITE) ● Estimate the causal effect of an intervention: if 𝑢 changes, how does the outcome 𝑍 𝑢 change? ● T arget for estimation: 𝑍 1 − 𝑍 0 ● T arget is unobserved: the fundamental problem of causal inference 𝐽𝑈𝐹: 𝜐 𝑦 = 𝔽 𝑍 1 |𝑦) 𝑍 1 − 𝔽 𝑍 0 |𝑦) 𝑍 1 ∼Pr(𝑍 0 ∼Pr(𝑍 0 9
ITE estimation from obs. data Two functions: Adjustment for Estimation of Confounding heterogeneity Confounders Effect modifiers 10
Confounders vs. Effect modifiers Average treatment effect 11
Confounders vs. Effect modifiers 12
Data efficient ITE estimation ITE Prediction Adjustment for Estimation of Confounding heterogeneity ITE discovery 13
Insight Leverage the difference between tasks at training and test time to reduce data collection burden at test time ITE Discovery ITE Prediction 14
Why Trees?
Trees identify the most important axes of heterogeneity 1 2 2 3 3 3 3
Trees can be traversed till querying ability is exhausted
Different individuals → different queries
Algorithm: DEITEE Data Efficient Individual Treatment Effect Estimator ITE Discovery: Base model ITE Prediction DEITEE model 19
Experiments: Synthetic Data: ACIC’17 simulated data (“semi - synthetic”) • N=5k; d=58 Base models: BART and GRF • Benchmarks: Train BART/GRF with feature • regularization Evaluation: (1) Accuracy relative to true ITE; • (2) Number of features queried 20
DEITEE: Features queried 21
DEITEE doesn’t sacrifice accuracy 22
Experiment on real data What is the effect of mother’s habits on newborn’s health? 1989 MA singleton births (CDC) N=90k; d=77 Mother’s Mean Absolute Mean habit Error relative to number of Alcohol? (treatment) proxy ITE DEITEE Y N features HS Age BART DEITEE- Education BART Y N Prenatal 580.20 580.20 15.42 Health Age? Married? care risks? Smoking 587.62 587.62 16.2 23
Conclusions • DEITEE reduces the number of features required to estimate individual causal effects ❖ Leverage difference between ITE discovery and ITE prediction • Ongoing: Careful analysis of distillation error; guarantees on effect modifier discovery • Need: Good robust method for model selection 24
Thanks! Confounding Heterogeneity ITE Prediction ITE Discovery ML adswamin@microsoft.com 25
Recommend
More recommend