single world intervention graphs swigs
play

Single World Intervention Graphs (SWIGs): Unifying the - PowerPoint PPT Presentation

Single World Intervention Graphs (SWIGs): Unifying the Counterfactual and Graphical Approaches to Causality Thomas Richardson Department of Statistics University of Washington Joint work with James Robins (Harvard School of Public Health)


  1. Single World Intervention Graphs (SWIGs): Unifying the Counterfactual and Graphical Approaches to Causality Thomas Richardson Department of Statistics University of Washington Joint work with James Robins (Harvard School of Public Health) Therme Vals Causal Workshop 5 Aug 2013

  2. Outline Brief review of counterfactuals A new unification of graphs and counterfactuals via node-splitting ◮ Factorization and Modularity ◮ Contrast with Twin Network approach ◮ Some Examples and Extensions ◮ Sequentially Randomized Experiments / Time Dependent Confounding ◮ Dynamic Regimes Experimental Testability and Independence of Errors in NPSEMs Thomas Richardson Therme Vals Workshop Slide 1

  3. Counterfactuals aka Potential Outcomes Thomas Richardson Therme Vals Workshop Slide 2

  4. The potential outcomes framework: philosophy Hume (1748) An Enquiry Concerning Human Understanding : We may define a cause to be an object followed by another, and where all the objects, similar to the first, are followed by objects similar to the second, . . . . . . where, if the first object had not been the second never had existed. Thomas Richardson Therme Vals Workshop Slide 3

  5. The potential outcomes framework: crop trials Jerzy Neyman (1923): To compare v varieties [on m plots] we will consider numbers: U 11 . . . U 1 m . . . . . . U v 1 . . . U vm Here U ij is the crop yield that would be observed if variety i were planted in plot j . Physical constraints only allow one variety to be planted in a given plot in any given growning season. Popularized by Rubin (1974); sometimes called the ‘Rubin causal model’. Thomas Richardson Therme Vals Workshop Slide 4

  6. Potential outcomes with binary treatment For binary treatment X and response Y , we define two potential outcome variables: Y ( x = 0 ) : the value of Y that would be observed for a given unit if assigned X = 0; Y ( x = 1 ) : the value of Y that would be observed for a given unit if assigned X = 1; WIll also write these as Y ( x 0 ) and Y ( x 1 ) . Implicit here is the assumption that these outcomes are well-defined. Specifically: ◮ Only one version of treatment X = x ◮ No interference between units (SUTVA). Thomas Richardson Therme Vals Workshop Slide 5

  7. Potential Outcomes Unit Potential Outcomes Observed Y ( x = 0 ) Y ( x = 1 ) X Y 1 0 1 2 0 1 3 0 0 4 1 1 5 1 0 Thomas Richardson Therme Vals Workshop Slide 6

  8. Drug Response ‘Types’: In the simplest case where Y is a binary outcome we have the following 4 types: Y ( x 0 ) Y ( x 1 ) Name 0 0 Never Recover 0 1 Helped 1 0 Hurt 1 1 Always Recover Thomas Richardson Therme Vals Workshop Slide 7

  9. Assignment to Treatments Unit Potential Outcomes Observed Y ( x = 0 ) Y ( x = 1 ) X Y 1 0 1 1 2 0 1 0 3 0 0 1 4 1 1 1 5 1 0 0 Thomas Richardson Therme Vals Workshop Slide 8

  10. Observed Outcomes from Potential Outcomes Unit Potential Outcomes Observed Y ( x = 0 ) Y ( x = 1 ) X Y 1 0 1 1 1 2 0 1 0 0 3 0 0 1 0 4 1 1 1 1 5 1 0 0 1 Thomas Richardson Therme Vals Workshop Slide 9

  11. Potential Outcomes and Missing Data Unit Potential Outcomes Observed Y ( x = 0 ) Y ( x = 1 ) X Y 1 ? 1 1 1 2 0 ? 0 0 3 ? 0 1 0 4 ? 1 1 1 5 1 ? 0 1 Thomas Richardson Therme Vals Workshop Slide 10

  12. Average Causal Effect (ACE) of X on Y ACE ( X → Y ) ≡ E [ Y ( x 1 ) − Y ( x 0 )] = p ( Helped ) − p ( Hurt ) ∈ [− 1, 1 ] Thus ACE ( X → Y ) is the difference in % recovery if everyone treated ( X = 1) vs. if noone treated ( X = 0). Thomas Richardson Therme Vals Workshop Slide 11

  13. Identification of the ACE under randomization If X is assigned randomly then X ⊥ ⊥ Y ( x 0 ) and X ⊥ ⊥ Y ( x 1 ) (1) hence E [ Y ( x 1 ) − Y ( x 0 )] = E [ Y ( x 1 )] − E [ Y ( x 0 )] = E [ Y ( x 1 ) | X = 1 ] − E [ Y ( x 0 ) | X = 0 ] = E [ Y | X = 1 ] − E [ Y | X = 0 ] . Thus if (1) holds then ACE ( X → Y ) is identified from P ( X , Y ) . Thomas Richardson Therme Vals Workshop Slide 12

  14. Inference for the ACE without randomization Suppose that we do not know that X ⊥ ⊥ Y ( x 0 ) and X ⊥ ⊥ Y ( x 1 ) . What can be inferred? X = 0 X = 1 Placebo Drug Y = 0 200 600 Y = 1 800 400 What is: The largest number of people who could be Helped ? 400 + 200 The smallest number of people who could be Hurt ? 0 ⇒ Max value of ACE: ( 200 + 400 ) / 2000 − 0 = 0.3 Similar logic: ⇒ Min value of ACE: 0 − ( 600 + 800 ) / 2000 = − 0.7 Thomas Richardson Therme Vals Workshop Slide 13

  15. Inference for the ACE without randomization Suppose that we do not know that X ⊥ ⊥ Y ( x 0 ) and X ⊥ ⊥ Y ( x 1 ) . General case: −( P ( x = 0, y = 1 ) + P ( x = 1, y = 0 )) � ACE ( X → Y ) ACE ( X → Y ) � P ( x = 0, y = 0 ) + P ( x = 1, y = 1 ) ⇒ Bounds will always cross zero. ⇒ X ⊥ ⊥ Y ( x 0 ) and X ⊥ ⊥ Y ( x 1 ) essential for drawing non-trivial causal inferences. Thomas Richardson Therme Vals Workshop Slide 14

  16. Summary of Counterfactual Approach In our observed data, for each unit one outcome will be ‘actual’; the others will be ‘counterfactual’. The potential outcome framework allows Causation to be ‘reduced’ to Missing Data ⇒ Conceptual progress! The ACE is identified if X ⊥ ⊥ Y ( x i ) for all values x i . Randomization of treatment assignment implies X ⊥ ⊥ Y ( x i ) . Ideas are central to Fisher’s Exact Test; also many parts of experimental design. The framework is the basis of many practical causal data analyses published in Biostatistics, Econometrics and Epidemiology. Thomas Richardson Therme Vals Workshop Slide 15

  17. Relating Counterfactuals and Structural Equations Potential outcomes can be seen as a different notation for Non-Parametric Structural Equation Models (NPSEMs): Example: X → Y . NPSEM formulation: Y = f ( X , ǫ Y ) Potential outcome formulation: Y ( x ) = f ( x , ǫ Y ) Two important caveats: NPSEMs typically assume all variables are seen as being subject to well-defined interventions (not so with potential outcomes) Pearl associates NPSEMs with Independent Errors (NPSEM-IEs) with DAGs (more on this later). Thomas Richardson Therme Vals Workshop Slide 16

  18. Relating Counterfactuals and ‘do’ notation Expressions in terms of ‘do’ can be expressed in terms of counterfactuals: P ( Y ( x ) = y ) ≡ P ( Y = y | do ( X = x )) but counterfactual notation is more general. Ex. Distribution of outcomes that would arise among those who took treatment ( X = 1) had counter-to-fact they not received treatment: P ( Y ( x = 0 ) = y | X = 1 ) If treatment is randomized, so X ⊥ ⊥ Y ( x = 0 ) then this equals P ( Y ( x = 0 ) = y ) , but in an observational study these may be different. Thomas Richardson Therme Vals Workshop Slide 17

  19. Graphs Thomas Richardson Therme Vals Workshop Slide 18

  20. Recap: Graphical Approach to Causality Unobserved H X Y X Y No Confounding Confounding Graph intended to represent direct causal relations. Convention that confounding variables (e.g. H ) are always included on the graph. Approach originates in the path diagrams introduced by Sewall Wright in the 1920s. If X → Y then X is said to be a parent of Y ; Y is child of X . Thomas Richardson Therme Vals Workshop Slide 19

  21. Edges are directed, but are they causal? No Confounding No Confounding X Y X Y P ( X , Y ) = P ( X ) P ( Y | X ) P ( X , Y ) = P ( Y ) P ( X | Y ) Neither factorization places any restriction on P ( X , Y ) . Thomas Richardson Therme Vals Workshop Slide 20

  22. Linking the two approaches Unobserved H Y Y X X X ⊥ ⊥ Y ( x 0 ) & X ⊥ ⊥ Y ( x 1 ) X � ⊥ ⊥ Y ( x 0 ) & X � ⊥ ⊥ Y ( x 1 ) Elephant in the room: The variables Y ( x 0 ) and Y ( x 1 ) do not appear on these graphs!! Thomas Richardson Therme Vals Workshop Slide 21

  23. Node splitting: Setting X to 0 P ( X = ˜ x , Y = ˜ y ) = P ( X = ˜ x ) P ( Y = ˜ y | X = ˜ x ) ⇒ x = 0 Y ( x = 0 ) X Y X Can now ‘read’ the independence: X ⊥ ⊥ Y ( x = 0 ) . Also associate a new factorization: P ( X = ˜ x , Y ( x = 0 )= ˜ y ) = P ( X = ˜ x ) P ( Y ( x = 0 )= ˜ y ) where: P ( Y ( x = 0 )= ˜ y ) = P ( Y = ˜ y | X = 0 ) . This last equation links a term in the original factorization to the new factorization. We term this the ‘modularity assumption’. Thomas Richardson Therme Vals Workshop Slide 22

  24. Node splitting: Setting X to 1 P ( X = ˜ x , Y = ˜ y ) = P ( X = ˜ x ) P ( Y = ˜ y | X = ˜ x ) ⇒ Y ( x = 1 ) Y x = 1 X X Can now ‘read’ the independence: X ⊥ ⊥ Y ( x = 1 ) . Also associate a new factorization: P ( X = ˜ x , Y ( x = 1 )= ˜ y ) = P ( X = ˜ x ) P ( Y ( x = 1 )= ˜ y ) where: P ( Y ( x = 1 )= y ) = P ( Y = y | X = 1 ) . Thomas Richardson Therme Vals Workshop Slide 23

  25. Crucial point: Y ( x = 0 ) and Y ( x = 1 ) are never on the same graph. Although we have: X ⊥ ⊥ Y ( x = 0 ) and X ⊥ ⊥ Y ( x = 1 ) we do not have X ⊥ ⊥ Y ( x = 0 ) , Y ( x = 1 ) Had we tried to construct a single graph containing both Y ( x = 0 ) and Y ( x = 1 ) this would have been impossible. (Why?) ⇒ Single-World Intervention Graphs (SWIGs). Thomas Richardson Therme Vals Workshop Slide 24

Recommend


More recommend