Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding ICML, 2019 Muhammad Osama, Dave Zachariah, Thomas B. Sch¨ on Division of System and Control, Department of Information Technology, Uppsala University 1 / 9 muhammad.osama@it.uu.se
Causal inference problem ◮ y ∈ R : Outcome of interest 2 / 9 muhammad.osama@it.uu.se
Causal inference problem ◮ y ∈ R : Outcome of interest ◮ z ∈ R : Exposure variable 2 / 9 muhammad.osama@it.uu.se
Causal inference problem ◮ y ∈ R : Outcome of interest ◮ z ∈ R : Exposure variable i =1 , where s ∈ R d is spatial location ◮ D n = { y i , z i , s i } n 2 / 9 muhammad.osama@it.uu.se
Causal inference problem ◮ y ∈ R : Outcome of interest ◮ z ∈ R : Exposure variable i =1 , where s ∈ R d is spatial location ◮ D n = { y i , z i , s i } n ◮ Target quantity: Average effect of assigning z = � z on y at location s 2 / 9 muhammad.osama@it.uu.se
Causal inference problem ◮ y ∈ R : Outcome of interest ◮ z ∈ R : Exposure variable i =1 , where s ∈ R d is spatial location ◮ D n = { y i , z i , s i } n ◮ Target quantity: Average effect of assigning z = � z on y at location s � � d τ = z E y ( � z ) | s (1) d � 2 / 9 muhammad.osama@it.uu.se
Causal inference problem ◮ y ∈ R : Outcome of interest ◮ z ∈ R : Exposure variable i =1 , where s ∈ R d is spatial location ◮ D n = { y i , z i , s i } n ◮ Target quantity: Average effect of assigning z = � z on y at location s � � d τ = z E y ( � z ) | s (1) d � y = income z = age 2 / 9 muhammad.osama@it.uu.se
Causal inference problem ◮ y ∈ R : Outcome of interest ◮ z ∈ R : Exposure variable i =1 , where s ∈ R d is spatial location ◮ D n = { y i , z i , s i } n ◮ Target quantity: Average effect of assigning z = � z on y at location s � � d τ = z E y ( � z ) | s (1) d � ◮ c : Unobserved confounding variables y = income z = age 2 / 9 muhammad.osama@it.uu.se
Causal inference problem ◮ y ∈ R : Outcome of interest ◮ z ∈ R : Exposure variable i =1 , where s ∈ R d is spatial location ◮ D n = { y i , z i , s i } n ◮ Target quantity: Average effect of assigning z = � z on y at location s � � d τ = z E y ( � z ) | s (1) d � ◮ c : Unobserved confounding variables y = income z = age c = unemployment 2 / 9 muhammad.osama@it.uu.se
Causal Inference Problem 3 c 2 1 s 0 -1 -2 -3 z y -5 0 5 Example: Here τ = 0 yet Cov( z, y ) � = 0 . 3 / 9 muhammad.osama@it.uu.se
Approach ◮ Assumptions: 4 / 9 muhammad.osama@it.uu.se
Approach ◮ Assumptions: � � � � ◮ E = E y ( � z ) | s y | z = � z, s 4 / 9 muhammad.osama@it.uu.se
Approach ◮ Assumptions: � � � � ◮ E = E y ( � z ) | s y | z = � z, s � � ◮ E y | z = � z, s is affine in z 4 / 9 muhammad.osama@it.uu.se
Approach ◮ Assumptions: � � � � ◮ E = E y ( � z ) | s y | z = � z, s � � ◮ E y | z = � z, s is affine in z y = τ ( s ) z + β ( s ) + ǫ (2) 4 / 9 muhammad.osama@it.uu.se
Approach ◮ Assumptions: � � � � ◮ E = E y ( � z ) | s y | z = � z, s � � ◮ E y | z = � z, s is affine in z y = τ ( s ) z + β ( s ) + ǫ (2) ◮ β ( s ) is a nuisance function correlated with spatially varying exposure z . 4 / 9 muhammad.osama@it.uu.se
Approach ◮ Assumptions: � � � � ◮ E = E y ( � z ) | s y | z = � z, s � � ◮ E y | z = � z, s is affine in z y = τ ( s ) z + β ( s ) + ǫ (2) ◮ β ( s ) is a nuisance function correlated with spatially varying exposure z . 10 1 8 0.5 6 0 4 -0.5 2 0 -1 0 2 4 6 8 10 τ ( s ) 4 / 9 muhammad.osama@it.uu.se
Approach ◮ Assumptions: � � � � ◮ E = E y ( � z ) | s y | z = � z, s � � ◮ E y | z = � z, s is affine in z y = τ ( s ) z + β ( s ) + ǫ (2) ◮ β ( s ) is a nuisance function correlated with spatially varying exposure z . 10 1 10 1 8 8 0.5 0.5 6 6 0 0 4 4 -0.5 -0.5 2 2 0 -1 0 -1 0 2 4 6 8 10 0 2 4 6 8 10 τ ( s ) τ ( s ) via (2) � 4 / 9 muhammad.osama@it.uu.se
Approach ◮ Assumptions: � � � � ◮ E = E y ( � z ) | s y | z = � z, s � � ◮ E y | z = � z, s is affine in z y = τ ( s ) z + β ( s ) + ǫ (2) ◮ β ( s ) is a nuisance function correlated with spatially varying exposure z . 10 1 10 1 10 1 8 8 8 0.5 0.5 0.5 6 6 6 0 0 0 4 4 4 -0.5 -0.5 -0.5 2 2 2 0 -1 0 -1 0 -1 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 τ ( s ) τ ( s ) via (2) � � τ ( s ) proposed 4 / 9 muhammad.osama@it.uu.se
Error-in-variables model � � � � ◮ Let w = y − E and v = z − E y | s z | s [1] 5 / 9 muhammad.osama@it.uu.se
Error-in-variables model � � � � ◮ Let w = y − E and v = z − E y | s z | s [1] ◮ (2) becomes w = τ ( s ) v + ǫ (3) 5 / 9 muhammad.osama@it.uu.se
Error-in-variables model � � � � ◮ Let w = y − E and v = z − E y | s z | s [1] ◮ (2) becomes w = τ ( s ) v + ǫ (3) ◮ The effect τ ( s ) is directly identifiable from (3) which we parameterize as τ θ ( s ) ∈ { f ( s ) : f = φ ( s ) ⊤ θ } , 5 / 9 muhammad.osama@it.uu.se
Error-in-variables model � � � � ◮ Let w = y − E and v = z − E y | s z | s [1] ◮ (2) becomes w = τ ( s ) v + ǫ (3) ◮ The effect τ ( s ) is directly identifiable from (3) which we parameterize as τ θ ( s ) ∈ { f ( s ) : f = φ ( s ) ⊤ θ } , ◮ Residuals w and v are not observed but estimated so that � � � � � y − � w = E [ y | s ] + E [ y | s ] − E [ y | s ] , � �� � � �� � w � w � � � � � � z − � v = E [ z | s ] + E [ z | s ] − E [ z | s ] , � �� � � �� � � � v v where � w and � v denote errors 5 / 9 muhammad.osama@it.uu.se
Proposed robust method ◮ Then (3) becomes � � ⊤ θ + � w = � � v φ ( s ) + δ ( s ) ǫ where δ ( s ) = � v φ ( s ) is an unobserved random deviation 6 / 9 muhammad.osama@it.uu.se
Proposed robust method ◮ Then (3) becomes � � ⊤ θ + � w = � � v φ ( s ) + δ ( s ) ǫ where δ ( s ) = � v φ ( s ) is an unobserved random deviation ◮ Robust estimator with tolerance against worst-case deviation δ ( s ) � � � �� � θ = arg min max E n | � w − ( � v φ ( s ) + δ ) ⊤ θ | 2 (4) δ ∈ ∆ θ 6 / 9 muhammad.osama@it.uu.se
Proposed robust method ◮ Then (3) becomes � � ⊤ θ + � w = � � v φ ( s ) + δ ( s ) ǫ where δ ( s ) = � v φ ( s ) is an unobserved random deviation ◮ Robust estimator with tolerance against worst-case deviation δ ( s ) � � � �� � θ = arg min max E n | � w − ( � v φ ( s ) + δ ) ⊤ θ | 2 (4) δ ∈ ∆ θ where � � | δ k | 2 � � vφ k ( s ) | 2 � � ≤ n − 1 E n ∆ = δ : E n | � , ∀ k 6 / 9 muhammad.osama@it.uu.se
Proposed robust method ◮ Then (3) becomes � � ⊤ θ + � w = � � v φ ( s ) + δ ( s ) ǫ where δ ( s ) = � v φ ( s ) is an unobserved random deviation ◮ Robust estimator with tolerance against worst-case deviation δ ( s ) � � � �� � θ = arg min max E n | � w − ( � v φ ( s ) + δ ) ⊤ θ | 2 (4) δ ∈ ∆ θ where � � | δ k | 2 � � vφ k ( s ) | 2 � � ≤ n − 1 E n ∆ = δ : E n | � , ∀ k ◮ (4) is a convex problem and can be solved using coordinate descent. 6 / 9 muhammad.osama@it.uu.se
Real data ◮ y : Number of crimes, z : number of poor families across states s = { 1 , . . . , 50 } 7 / 9 muhammad.osama@it.uu.se
Real data ◮ y : Number of crimes, z : number of poor families across states s = { 1 , . . . , 50 } 0.40 Significant Insignificant [Effect estimate of poverty on crime] 0.35 0.30 0.25 0.20 (a) Estimate � τ ( s ) (b) Significance at 5% level ◮ Results consistent with previous findings [2] 7 / 9 muhammad.osama@it.uu.se
Conclusion ◮ We propose an orthogonalization-based strategy for estimating heterogeneous effects from spatial data in presence of spatially varying confounding variables ◮ Our proposed method is robust to errors-in-variables ◮ Visit poster # 80 at Pacific Ballroom 6 . 30 pm − 9 pm 8 / 9 muhammad.osama@it.uu.se
References Chernozukhov et al., Double machine learning for treatment and causal parameters , cemmap working paper, Centre for Microdata Methods and Practice, 2016. Ellis et al., Crime, delinquency, and social status: A reconsideration , Journal of Offender Rehabilitation, 2001. 9 / 9 muhammad.osama@it.uu.se
Recommend
More recommend