estimating average causal effects under general
play

Estimating average causal effects under general interference between - PowerPoint PPT Presentation

Estimating average causal effects under general interference between units Peter M. Aronow and Cyrus Samii Yale University and New York University May 23, 2012 1 / 44 Randomized experiments often involve treatments that may induce


  1. Estimating average causal effects under general interference between units Peter M. Aronow and Cyrus Samii Yale University and New York University May 23, 2012 1 / 44

  2. Randomized experiments often involve treatments that may induce “interference between units” Interference: the outcome for unit i depends on the treatment assigned to unit j . If we administer a treatment to unit j , what are the effects on unit i ? Recent work in non-parametric inference focuses on hypothesis testing or estimation in hierarchical (i.e., multilevel) interference settings. We develop a theory of estimation under general forms of interference. 2 / 44

  3. We provide a nonparametric design-based (c.f. Neyman 1923) method for estimating average causal effects, including, but not limited to: Direct effect of assigning a unit to treatment Indirect effects of, e.g., a unit’s peer being assigned to treatment More complex effects (e.g., effect of having a majority of proximal peers treated) In so doing, we highlight how equal probability of treatment assignment does not imply equal probability of indirect exposure to treatment (e.g., proximity to treated units) We develop our main results drawing on classical sampling theory, though model-assisted refinements are possible 3 / 44

  4. Method summary: Design information gives probability distribution for treatment, Z s.t. supp ( Z ) = Ω . Specify an exposure model that converts assigned treatment vectors z ∈ Ω to exposures based on unit attributes (e.g., network degree), f ( Z , θ i ) ≡ D i Implies the exact probabilities of exposure: � π i ( d k ) = p z I ( f ( z , θ i ) = d k ) z ∈ Ω Average causal effects are the average difference between the potential outcomes under exposure d k vs. those under d l . Estimate average causal effects accounting for varying probability of exposures (via some variant of inverse probability weighting). 4 / 44

  5. Roadmap: Simple running example. Some technical details. Application. Anticipating some concerns. 5 / 44

  6. Simple running example. Consider a randomized experiment performed on a finite population of four units in a simple, fixed network: 6 / 44

  7. 1 2 3 4 7 / 44

  8. One of these units is assigned to receive an advertisement and the other three are assigned to control, equal probability We want to estimate the effects of advertising on opinion There are four possible randomizations z : 8 / 44

  9. 1 2 3 4 9 / 44

  10. 1 2 3 4 10 / 44

  11. 1 2 3 4 11 / 44

  12. 1 2 3 4 12 / 44

  13. So we have exact knowledge of the randomization scheme. But what of the exposure model? This requires researcher discretion. How do we model exposure to a treatment? One example. 13 / 44

  14. Direct exposure means that you have been treated. Indirect exposure means that a peer has been treated.   Di(rect) : z i = 1  D i = In(direct) : z i ± 1 = 1   Co(ntrol) : z i = Z i ± 1 = 0 . There is nothing particularly special about this model, except for its parsimony. Arbitrarily complex exposure models are possible. Let’s visualize this. 14 / 44

  15. 1 2 3 4 15 / 44

  16. 1 2 3 4 16 / 44

  17. 1 2 3 4 17 / 44

  18. 1 2 3 4 18 / 44

  19. Summarizing: Unit # Unit # 1 2 3 4 1 2 3 4 1 1 0 0 0 1 Di In Co Co Rand. # Rand. # 2 0 1 0 0 − → 2 In Di In Co 3 0 0 1 0 3 Co In Di In 4 0 0 0 1 4 Co Co In Di Design Z i Exposure D i 19 / 44

  20. We can figure out the exact probabilities that each of the four units would be in each of the exposure conditions: Unit # 1 2 3 4 1 Di In Co Co Rand. # 2 In Di In Co 3 Co In Di In 4 Co Co In Di Exposure D i Unit # 1 2 3 4 Direct 0.25 0.25 0.25 0.25 Indirect 0.25 0.50 0.50 0.25 Control 0.50 0.25 0.25 0.50 Probabilties π i ( D i ) 20 / 44

  21. Let’s make up some potential outcomes associated with each exposure: Unit # 1 2 3 4 Mean Direct 5 10 10 3 7 Indirect 0 3 3 2 2 Control 1 3 6 2 3 Potential outcomes Y i ( D i ) � N Average causal effect: τ ( d k , d l ) = 1 i = 1 [ Y i ( d k ) − Y i ( d l )] . N � N E.g., τ ( Direct , Control ) = 1 i = 1 [ Y i ( Direct ) − Y i ( Control )] = 4. N 21 / 44

  22. Unequal probability design provides a natural and design-unbiased estimator. Assuming π i ( d k ) > 0 and π i ( d l ) > 0, the Horvitz-Thompson (HT) estimator: � I ( D i = d k ) � N � τ HT ( d k , d l ) = 1 Y i ( d k ) − I ( D i = d l ) ˆ Y i ( d l ) N π i ( d k ) π i ( d l ) i = 1 Unbiasedness follows from E [ I ( D i = d k )] = π i ( d k ) . Note: when, for some i , π i ( d k ) = 0 or π i ( d j ) = 0, τ ( d k , d l ) must be estimated only for units with some probability of receiving both exposures. 22 / 44

  23. Applying estimators to this setup: Diff. in Means OLS w/ cov. adj. τ HT ( d k , d l ) � 1 1.00 -1.00 3.00 -3.00 -2.00 -5.50 2 8.00 -0.50 5.00 -2.00 9.00 0.50 Rand. # 3 9.00 1.50 8.00 1.00 9.50 3.00 4 1.00 1.00 2.00 -5.44 -0.50 -2.00 E[.] 4.75 0.25 4.50 -1.00 4.00 -1.00 Bias 0.75 1.25 0.50 0.00 0.00 0.00 τ ( Di , Co ) τ ( In , Co ) τ ( Di , Co ) τ ( In , Co ) τ ( Di , Co ) τ ( In , Co ) Other approaches are biased and inconsistent (i.e., this is not just a small sample problem). Bias can go any number of ways depending on nature of confounding and effect heterogeneity. Another crucial point is that the variance of HT estimator is straightforward. We cannot rely on standard methods to compute standard errors or confidence intervals: 23 / 44

  24. Exact variance: � τ HT ( d k , d l )) = 1 Var [ � HT ( d k )] + Var [ � Y T Y T Var ( � HT ( d l )] N 2 � − 2 Cov [ � HT ( d k ) , � Y T Y T HT ( d l )] , where � N � N Cov [ I ( D i = d k ) , I ( D j = d k )] Y i ( d k ) Y j ( d k ) Var [ � Y T HT ( d k )] = π i ( d k ) π j ( d k ) i = 1 j = 1 N N � � Cov [ I ( D i = d k ) , I ( D j = d l )] Y i ( d k ) Y j ( d l ) Cov [ � HT ( d k ) , � Y T Y T HT ( d l )] = π i ( d k ) π j ( d l ) i = 1 j = 1 24 / 44

  25. Conservative variance estimator: Via Young’s inequality (c.f., Aronow and Samii 2012), given π ij ( d k , d l ) > 0 , ∀ i � = j , ��  � � 2  � Y i ( d k ) Var [ � 1 τ HT ( d k , d l )] = i ∈ U I ( D i = d k )[ 1 − π i ( d k )] � N 2 π i ( d k ) Var [ � µ HT ( d l )] + � �  π ij ( d k ) − π i ( d k ) π j ( d k ) Y j ( d k ) Y i ( d k ) j ∈ U \ i I ( D i = d k ) I ( D j = d k ) i ∈ U π ij ( d k ) π i ( d k ) π j ( d k )  � � 2 + �  Y i ( d l ) i ∈ U I ( D i = d l )[ 1 − π i ( d l )] � π i ( d l ) Var [ � µ HT ( d k )] + � �  π ij ( d l ) − π i ( d l ) π j ( d l ) Y j ( d l ) Y i ( d l ) j ∈ U \ i I ( D i = d l ) I ( D j = d l ) i ∈ U π ij ( d l ) π i ( d l ) π j ( d l ) − 2 � � � I ( D i = d k ) I ( D j = d l ) Y i ( d k ) Y j ( d l ) � �� i ∈ U j ∈ U \ i π ij ( d k , d l ) π i ( d k ) π j ( d l ) − 2 � + 2 � Cov C [ � µ HT ( d l ) , � µ HT ( d k )] . I ( D i = d k ) Y i ( d k ) 2 + I ( D i = d l ) Y i ( d l ) 2 i ∈ U 2 π i ( d k ) 2 π i ( d l ) Unbiased under sharp null hypothesis of no effect, given π ij ( d k , d l ) > 0. (More) conservative variance estimator when ∃ i , j , k , l s.t. π ij ( d k , d l ) = 0. 25 / 44

  26. Asymptotics and intervals: We adopt Brewer (1979)’s large sample scaling, analogous to obtaining estimates by aggregating results from repeated experimentation on a fixed finite population. Consistency and asymptotic normality of � τ HT ( d k , d l ) follow from the WLLN and classical CLT respectively. By the WLLN, p N � Var [ � τ HT ( d k , d l )] − → N Var [ � τ HT ( d k , d l )] + c 1 , where c 1 ≥ 0. Then � � d ( � τ HT ( d k , d l ) − τ HT ( d k , d l )) / Var [ � τ HT ( d k , d l )] − → N ( 0 , 1 − c 2 ) , where 0 ≤ c 2 < 1. Intervals constructed as � � τ HT ( d k , d l ) ± z 1 − α/ 2 � Var [ � τ HT ( d k , d l )] will asymptotically cover τ HT ( d k , d l ) at least 100 ( 1 − α )% of the time. We’ve also proven consistency of estimators and variance under a generalized m -dependence set-up. Restrictions on clustering are key. 26 / 44

  27. Paper proposes refinements for covariate adjustment, weight stabilization, and variance approximation under a constant effect assumption. Further refinements include modeling outcomes based on determinants of exposure probabilities, using HT results to determine appropriate variance approximation. Regardless of the method used, the implied inverse probability weights are fundamental for the consistency of any estimator of average causal effects. Under proper specification, this weighting can be reproduced by regression estimators (in particular, interaction with centered fixed effects for all unique values of probability of exposure) in the limit. 27 / 44

  28. Let’s consider a richer example. Goal is to estimate direct and indirect effects of a treatment offered to a randomly selected set of individuals on a complex, undirected network (e.g., an anti-prejudice curriculum in schools – Paluck and Shepherd 2012) 28 / 44

  29. Network 29 / 44

  30. Suppose complete random assignment of M = . 2 N units to treatment. � indicator � N Design implies Z has uniform probability over Ω , an N × M matrix, where z is a realization a Z , e.g., z = ( z 1 , z 2 , z 3 , ..., z N − 1 , z N ) ′ = ( 0 , 1 , 0 , ..., 1 , 0 ) ′ . 30 / 44

Recommend


More recommend