Treatment effect estimation with missing attributes Julie Josse ´ Ecole Polytechnique, INRIA Visiting Researcher, Google Brain Mathematical Methods of Modern Statistics, June 2020 1
Collaborators Methods: Imke Mayer (PhD X, EHESS), Jean-Philippe Vert (Google Brain), Stefan Wager (Stanford) Assistance Publique Hopitaux de Paris 2
Covid data • 4780 patients (patients with at least one PCR-documented SARS-CoV-2 RNA from a nasopharyngeal sample) • 119 continuous and categorical variables: heterogeneous • 34 hospitals: multilevel data Hospital Treatment Age Sex Weight DDI BP dead28 . . . Beaujon HCQ 54 m 85 NM 180 yes Pitie AZ 76 m NR NR 131 no Beaujon HCQ+AZ 63 m 80 270 145 yes Pitie HCQ 80 f NR NR 107 no HEGP none 66 m 98 5890 118 no . ... . . 3
Covid data • 4780 patients (patients with at least one PCR-documented SARS-CoV-2 RNA from a nasopharyngeal sample) • 119 continuous and categorical variables: heterogeneous • 34 hospitals: multilevel data Hospital Treatment Age Sex Weight DDI BP dead28 . . . Beaujon HCQ 54 m 85 NM 180 yes Pitie AZ 76 m NR NR 131 no Beaujon HCQ+AZ 63 m 80 270 145 yes Pitie HCQ 80 f NR NR 107 no HEGP none 66 m 98 5890 118 no . ... . . ⇒ Estimate causal effect : Administration of the treatment ”Hydroxychloroquine” on the outcome 28-day mortality. 3
Observational data: non random assignment survived deceased Pr(survived | treatment) Pr(deceased | treatment) HCQ 497 (11.4%) 111 (2.6%) 0.817 0.183 HCQ+AZI 158 (3.6%) 54 (1.2%) 0.745 0.255 none 2699 (62.1%) 830 (19.1%) 0.765 0.235 Mortality rate 23% - for HCQ 18% - non treated 24%: treatment helps? 0.025 0.020 0.015 0.010 0.005 Mean Median 0.000 0.025 Treatment arm HCQ Nothing 0.020 0.015 0.010 0.005 0.000 25 50 75 100 Age Comparison of the distribution of Age between HCQ and non treated. Severe patients (with higher risk of death) are less likely to be treated. If control group does not look like treatment group, difference in response may 4 be confounded by differences between the groups.
Potential outcome framework (Neyman, 1923, Rubin, 1974) Causal effect • n iid samples ( X i , W i , Y i (1) , Y i (0)) ∈ R d × { 0 , 1 } × R × R • Individual causal effect of the treatment: ∆ i � Y i (1) − Y i (0) Missing problem: ∆ i never observed (only observe one outcome/indiv) Covariates Treatment Outcome(s) X 1 X 2 X 3 W Y(0) Y(1) 1.1 20 F 1 ? Survived -6 45 F 0 Dead ? 0 15 M 1 ? Survived . . . . . . . . . . . . -2 52 M 0 Survived ? 5
Potential outcome framework (Neyman, 1923, Rubin, 1974) Causal effect • n iid samples ( X i , W i , Y i (1) , Y i (0)) ∈ R d × { 0 , 1 } × R × R • Individual causal effect of the treatment: ∆ i � Y i (1) − Y i (0) Missing problem: ∆ i never observed (only observe one outcome/indiv) Covariates Treatment Outcome(s) X 1 X 2 X 3 W Y(0) Y(1) 1.1 20 F 1 ? Survived -6 45 F 0 Dead ? 0 15 M 1 ? Survived . . . . . . . . . . . . -2 52 M 0 Survived ? Average treatment effect (ATE) : τ � E [∆ i ] = E [ Y i (1) − Y i (0)] The ATE is the difference of the average outcome had everyone gotten treated and the average outcome had nobody gotten treatment. ATE=0.05: mortality rate in the treated group is 5% points higher than in the control group. So, on average the treatment increases the risk of dying. 5
Assumption for ATE identifiability in observational data Unconfoundedness - selection on observables { Y i (0) , Y i (1) } ⊥ ⊥ W i | X i Treatment assignment W i is random conditionally on covariates X i Measure enough covariates to capture dependence between W i and outcomes • Observed outcome: Y i = W i Y i (1) + (1 − W i ) Y i (0) Unconfoundedness - graphical model X { Y (0) , Y (1) } W Y Unobserved confounders make it impossible to separate correlation and causality when correlated to both the outcome and the treatment. ATE not identifiable without assumption: it is not a sample size problem! 6
Assumption for ATE identifiability in observational data Propensity score: probability of treatment given observed covariates. Propensity score - overlap assumption e ( x ) � P ( W i = 1 | X i = x ) ∀ x ∈ X . We assume overlap, i.e. η < e ( x ) < 1 − η, ∀ x ∈ X and some η > 0 Left: Non smoker and never treated Right: Smokers and all treated If proba to be treated when smoker e ( x ) = 1, how to estimate the outcome for smokers when not treated Y (0)? How to extrapolate if total confusion? 7
Inverse-propensity weighting estimation of ATE Average treatment effect (ATE): τ � E [∆ i ] = E [ Y i (1) − Y i (0)] Propensity score: e ( x ) � P ( W i = 1 | X i = x ) IPW estimator (Horvitz-Thomson, survey) � W i Y i � n � τ IPW � 1 e ( X i ) − (1 − W i ) Y i ˆ n ˆ 1 − ˆ e ( X i ) i =1 ⇒ Balance the differences between the two groups ⇒ Consistent estimator of τ as long as ˆ e ( · ) is consistent. Y Y Reweighting control observations with Treated high X’s observations have adjusts for higher X’s on difference average X X 8 Credit: S. Athey
Doubly robust ATE estimation Model Treatment on Covariates e ( x ) � P ( W i = 1 | X i = x ) Model Outcome on Covariates µ ( w ) ( x ) � E [ Y i ( w ) | X i = x ] Augmented IPW - Double Robust (DR) � � � n Y i − ˆ µ (1) ( X i ) Y i − ˆ µ (0) ( X i ) τ AIPW � 1 ˆ ˆ µ (1) ( X i ) − ˆ µ (0) ( X i ) + W i − (1 − W i ) i =1 n e ( X i ) ˆ 1 − ˆ e ( X i ) is consistent if either the ˆ µ ( w ) ( x ) are consistent or ˆ e ( x ) is consistent. Possibility to use any (machine learning) procedure such as random forests , deep nets, etc. to estimate ˆ e ( x ) and ˆ µ ( w ) ( x ) without harming the interpretability of the causal effect estimation. Properties - Double Machine Learning (Chernozhukov et al., 2018) µ ( w ) ( x ) converge at the rate n 1 / 4 then If ˆ e ( x ) and ˆ √ n (ˆ n →∞ N (0 , V ∗ ), V ∗ semiparametric efficient variance. d τ DR − τ ) − − − → 9
Missing values Percentage days_since_first_case_datetime 20 40 60 0 Percentage of missing values age gender num_hospitals on_corticoids period asthma chemotherapy_radiotherapy cancer chronic_obstructive_pulmonary_disease chronic_hepatic_disease chronic_respiratory_failure diabetes dyslipidemia heart_arrhythmia hematological_malignancies Variable hypertension ischemic_heart_disease kidney_disease obesity smoker CREAT_value CRP_value PNN_value LYM_value TP_value GDS_PaCO2_value GDS_PaO2_value weigh_kg GDS_SAT_value LDH_value DDI_value 10
Missing values p = 300 = p = 5 An n × p matrix, each entry is missing with probability 0 . 01 significant role” (R. Samworth, 2019) “One of the ironies of Big Data is that missing data play an ever more Deleting rows with missing values? = ⇒ ≈ 5% of rows kept ⇒ ≈ 95% of rows kept Percentage d a y s _ 20 40 60 s n i 0 c e _ f i r Percentage of missing values s t _ a a c g e s e _ d a t e i t m e g e n d n u e r m _ h o s p t i a l o n s _ c o r t i c o i d s p e r i o d a s t h c h e m a m o t h e r a c a c p y n c e h o r c _ r a r n i c h o r d i _ o n c i o t h b s _ h e r a t r u e p p y c t v i a t i e _ c _ d p u i s e m l o a s c h r n a e o n r y _ c i _ d i r e s s e a p i r s e a t o r y _ f a i l u r e d i a b e t e s d y s i l p i d h h e e m e m a t r a i a t o _ a r o l g r h y i a c t h m l _ a i m a l i g n a n Variable c e i h s i s c y p e h e m r t e i c n s _ h i o n e a t r _ d s i k e a i d n s e e y _ d i s e a s e o b e s i t y s m o C e k R r E A T _ v a l u e C R P _ v a u l P e N N _ v a l u e L Y M _ v a l u e T G D P _ S v a l _ P u e a C O 2 G _ v D S a u l _ P e a O 2 _ v a l u e w e g i h G D _ k S _ g S A T _ v a l u e L D H _ v a u l e D D I _ v a l u e 10
Missing (informative) values in the covariates Straightforward – but often biased – solution is complete-case analysis. Covariates Treatment Outcome(s) X 1 X 2 X 3 W Y(0) Y(1) NA 20 F 1 ? Survived -6 45 NA 0 Dead ? 0 NA M 1 ? Survived NA 32 F 1 Dead ? 1 63 M 1 Dead ? -2 NA M 0 Survived ? → Often not a good idea! What are the alternatives? Three families of methods - different assumptions • Unconfoundedness with missingness + (no) missing values mechanisms Mayer, J., Wager, Sverdrup, Moyer, Gauss. AOAS 2020. • Classical unconfoundedness + classical missing values mechanisms • Latent unconfoundedness + classical missing values mechanisms Mayer, J., Raimundo, Vert. 2020. 11
Recommend
More recommend