Single World Intervention Graphs (SWIGs): Unifying the - PowerPoint PPT Presentation

Single World Intervention Graphs (SWIGs): Unifying the Counterfactual and Graphical Approaches to Causality Thomas Richardson Department of Statistics University of Washington Joint work with James Robins (Harvard School of Public Health) Therme Vals Causal Workshop 5 Aug 2013

Outline Brief review of counterfactuals A new unification of graphs and counterfactuals via node-splitting ◮ Factorization and Modularity ◮ Contrast with Twin Network approach ◮ Some Examples and Extensions ◮ Sequentially Randomized Experiments / Time Dependent Confounding ◮ Dynamic Regimes Experimental Testability and Independence of Errors in NPSEMs Thomas Richardson Therme Vals Workshop Slide 1

Counterfactuals aka Potential Outcomes Thomas Richardson Therme Vals Workshop Slide 2

The potential outcomes framework: philosophy Hume (1748) An Enquiry Concerning Human Understanding : We may define a cause to be an object followed by another, and where all the objects, similar to the first, are followed by objects similar to the second, . . . . . . where, if the first object had not been the second never had existed. Thomas Richardson Therme Vals Workshop Slide 3

The potential outcomes framework: crop trials Jerzy Neyman (1923): To compare v varieties [on m plots] we will consider numbers: U 11 . . . U 1 m . . . . . . U v 1 . . . U vm Here U ij is the crop yield that would be observed if variety i were planted in plot j . Physical constraints only allow one variety to be planted in a given plot in any given growning season. Popularized by Rubin (1974); sometimes called the ‘Rubin causal model’. Thomas Richardson Therme Vals Workshop Slide 4

Potential outcomes with binary treatment For binary treatment X and response Y , we define two potential outcome variables: Y ( x = 0 ) : the value of Y that would be observed for a given unit if assigned X = 0; Y ( x = 1 ) : the value of Y that would be observed for a given unit if assigned X = 1; WIll also write these as Y ( x 0 ) and Y ( x 1 ) . Implicit here is the assumption that these outcomes are well-defined. Specifically: ◮ Only one version of treatment X = x ◮ No interference between units (SUTVA). Thomas Richardson Therme Vals Workshop Slide 5

Potential Outcomes Unit Potential Outcomes Observed Y ( x = 0 ) Y ( x = 1 ) X Y 1 0 1 2 0 1 3 0 0 4 1 1 5 1 0 Thomas Richardson Therme Vals Workshop Slide 6

Drug Response ‘Types’: In the simplest case where Y is a binary outcome we have the following 4 types: Y ( x 0 ) Y ( x 1 ) Name 0 0 Never Recover 0 1 Helped 1 0 Hurt 1 1 Always Recover Thomas Richardson Therme Vals Workshop Slide 7

Assignment to Treatments Unit Potential Outcomes Observed Y ( x = 0 ) Y ( x = 1 ) X Y 1 0 1 1 2 0 1 0 3 0 0 1 4 1 1 1 5 1 0 0 Thomas Richardson Therme Vals Workshop Slide 8

Observed Outcomes from Potential Outcomes Unit Potential Outcomes Observed Y ( x = 0 ) Y ( x = 1 ) X Y 1 0 1 1 1 2 0 1 0 0 3 0 0 1 0 4 1 1 1 1 5 1 0 0 1 Thomas Richardson Therme Vals Workshop Slide 9

Potential Outcomes and Missing Data Unit Potential Outcomes Observed Y ( x = 0 ) Y ( x = 1 ) X Y 1 ? 1 1 1 2 0 ? 0 0 3 ? 0 1 0 4 ? 1 1 1 5 1 ? 0 1 Thomas Richardson Therme Vals Workshop Slide 10

Average Causal Effect (ACE) of X on Y ACE ( X → Y ) ≡ E [ Y ( x 1 ) − Y ( x 0 )] = p ( Helped ) − p ( Hurt ) ∈ [− 1, 1 ] Thus ACE ( X → Y ) is the difference in % recovery if everyone treated ( X = 1) vs. if noone treated ( X = 0). Thomas Richardson Therme Vals Workshop Slide 11

Identification of the ACE under randomization If X is assigned randomly then X ⊥ ⊥ Y ( x 0 ) and X ⊥ ⊥ Y ( x 1 ) (1) hence E [ Y ( x 1 ) − Y ( x 0 )] = E [ Y ( x 1 )] − E [ Y ( x 0 )] = E [ Y ( x 1 ) | X = 1 ] − E [ Y ( x 0 ) | X = 0 ] = E [ Y | X = 1 ] − E [ Y | X = 0 ] . Thus if (1) holds then ACE ( X → Y ) is identified from P ( X , Y ) . Thomas Richardson Therme Vals Workshop Slide 12

Inference for the ACE without randomization Suppose that we do not know that X ⊥ ⊥ Y ( x 0 ) and X ⊥ ⊥ Y ( x 1 ) . What can be inferred? X = 0 X = 1 Placebo Drug Y = 0 200 600 Y = 1 800 400 What is: The largest number of people who could be Helped ? 400 + 200 The smallest number of people who could be Hurt ? 0 ⇒ Max value of ACE: ( 200 + 400 ) / 2000 − 0 = 0.3 Similar logic: ⇒ Min value of ACE: 0 − ( 600 + 800 ) / 2000 = − 0.7 Thomas Richardson Therme Vals Workshop Slide 13

Inference for the ACE without randomization Suppose that we do not know that X ⊥ ⊥ Y ( x 0 ) and X ⊥ ⊥ Y ( x 1 ) . General case: −( P ( x = 0, y = 1 ) + P ( x = 1, y = 0 )) � ACE ( X → Y ) ACE ( X → Y ) � P ( x = 0, y = 0 ) + P ( x = 1, y = 1 ) ⇒ Bounds will always cross zero. ⇒ X ⊥ ⊥ Y ( x 0 ) and X ⊥ ⊥ Y ( x 1 ) essential for drawing non-trivial causal inferences. Thomas Richardson Therme Vals Workshop Slide 14

Summary of Counterfactual Approach In our observed data, for each unit one outcome will be ‘actual’; the others will be ‘counterfactual’. The potential outcome framework allows Causation to be ‘reduced’ to Missing Data ⇒ Conceptual progress! The ACE is identified if X ⊥ ⊥ Y ( x i ) for all values x i . Randomization of treatment assignment implies X ⊥ ⊥ Y ( x i ) . Ideas are central to Fisher’s Exact Test; also many parts of experimental design. The framework is the basis of many practical causal data analyses published in Biostatistics, Econometrics and Epidemiology. Thomas Richardson Therme Vals Workshop Slide 15

Relating Counterfactuals and Structural Equations Potential outcomes can be seen as a different notation for Non-Parametric Structural Equation Models (NPSEMs): Example: X → Y . NPSEM formulation: Y = f ( X , ǫ Y ) Potential outcome formulation: Y ( x ) = f ( x , ǫ Y ) Two important caveats: NPSEMs typically assume all variables are seen as being subject to well-defined interventions (not so with potential outcomes) Pearl associates NPSEMs with Independent Errors (NPSEM-IEs) with DAGs (more on this later). Thomas Richardson Therme Vals Workshop Slide 16

Relating Counterfactuals and ‘do’ notation Expressions in terms of ‘do’ can be expressed in terms of counterfactuals: P ( Y ( x ) = y ) ≡ P ( Y = y | do ( X = x )) but counterfactual notation is more general. Ex. Distribution of outcomes that would arise among those who took treatment ( X = 1) had counter-to-fact they not received treatment: P ( Y ( x = 0 ) = y | X = 1 ) If treatment is randomized, so X ⊥ ⊥ Y ( x = 0 ) then this equals P ( Y ( x = 0 ) = y ) , but in an observational study these may be different. Thomas Richardson Therme Vals Workshop Slide 17

Graphs Thomas Richardson Therme Vals Workshop Slide 18

Recap: Graphical Approach to Causality Unobserved H X Y X Y No Confounding Confounding Graph intended to represent direct causal relations. Convention that confounding variables (e.g. H ) are always included on the graph. Approach originates in the path diagrams introduced by Sewall Wright in the 1920s. If X → Y then X is said to be a parent of Y ; Y is child of X . Thomas Richardson Therme Vals Workshop Slide 19

Edges are directed, but are they causal? No Confounding No Confounding X Y X Y P ( X , Y ) = P ( X ) P ( Y | X ) P ( X , Y ) = P ( Y ) P ( X | Y ) Neither factorization places any restriction on P ( X , Y ) . Thomas Richardson Therme Vals Workshop Slide 20

Linking the two approaches Unobserved H Y Y X X X ⊥ ⊥ Y ( x 0 ) & X ⊥ ⊥ Y ( x 1 ) X � ⊥ ⊥ Y ( x 0 ) & X � ⊥ ⊥ Y ( x 1 ) Elephant in the room: The variables Y ( x 0 ) and Y ( x 1 ) do not appear on these graphs!! Thomas Richardson Therme Vals Workshop Slide 21

Node splitting: Setting X to 0 P ( X = ˜ x , Y = ˜ y ) = P ( X = ˜ x ) P ( Y = ˜ y | X = ˜ x ) ⇒ x = 0 Y ( x = 0 ) X Y X Can now ‘read’ the independence: X ⊥ ⊥ Y ( x = 0 ) . Also associate a new factorization: P ( X = ˜ x , Y ( x = 0 )= ˜ y ) = P ( X = ˜ x ) P ( Y ( x = 0 )= ˜ y ) where: P ( Y ( x = 0 )= ˜ y ) = P ( Y = ˜ y | X = 0 ) . This last equation links a term in the original factorization to the new factorization. We term this the ‘modularity assumption’. Thomas Richardson Therme Vals Workshop Slide 22

Node splitting: Setting X to 1 P ( X = ˜ x , Y = ˜ y ) = P ( X = ˜ x ) P ( Y = ˜ y | X = ˜ x ) ⇒ Y ( x = 1 ) Y x = 1 X X Can now ‘read’ the independence: X ⊥ ⊥ Y ( x = 1 ) . Also associate a new factorization: P ( X = ˜ x , Y ( x = 1 )= ˜ y ) = P ( X = ˜ x ) P ( Y ( x = 1 )= ˜ y ) where: P ( Y ( x = 1 )= y ) = P ( Y = y | X = 1 ) . Thomas Richardson Therme Vals Workshop Slide 23

Crucial point: Y ( x = 0 ) and Y ( x = 1 ) are never on the same graph. Although we have: X ⊥ ⊥ Y ( x = 0 ) and X ⊥ ⊥ Y ( x = 1 ) we do not have X ⊥ ⊥ Y ( x = 0 ) , Y ( x = 1 ) Had we tried to construct a single graph containing both Y ( x = 0 ) and Y ( x = 1 ) this would have been impossible. (Why?) ⇒ Single-World Intervention Graphs (SWIGs). Thomas Richardson Therme Vals Workshop Slide 24

Single World Intervention Graphs (SWIGs): Unifying the - PowerPoint PPT Presentation

Single World Intervention Graphs (SWIGs): Unifying the Counterfactual and Graphical Approaches to Causality Thomas Richardson Department of Statistics University of Washington Joint work with James Robins (Harvard School of Public Health)

Graphs () Graphs () Graphs Graphs Graphs are collections of nodes

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Week 4 Kullmann Graphs and directed graphs Elementary Graph Algorithms Representing graphs

On some classes of Deza graphs Deza graphs without 3-cocliques Line graphs V.V. Kabanov 1 Deza

Graphs Graphs Examples Definitions Implementation/Representation of graphs Graphs

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

Searching on Graphs November 16, 2016 CMPE 250 Graphs- Searching on Graphs November 16, 2016 1

CS200: Graphs Prichard Ch. 14 Rosen Ch. 10 CS200 - Graphs 1 Graphs A collection of What can

Today. Types of graphs. Today. Types of graphs. Complete Graphs. Trees. Hypercubes. Today.

Examples of Obstructions to Apex Graphs, Edge-Apex Graphs, and Contraction-Apex Graphs

STACKED GRAPHS STACKED GRAPHS EVOLUTION OF STACKED GRAPHS Stacked Area Chart Themeriver

Algorithms for Lipschitz Learning on Graphs Sushant Sachdeva Yale Institute of Network Sciences

Graphs Graph definitions There are two kinds of graphs: directed graphs (sometimes called

Graphs Graphs Simple graphs Algorithms Depth-first search Breadth-first search

House of Graphs: Introduction what are interesting graphs? GraPHedron First Definition of

Graphs Graphs Definitions Implementation/Representation of graphs Search Traversing

Coming Clean About Norovirus: How to Dodge the Spread NEHA-BIA Webinar Lee-Ann Jaykus, Ph.D.

Bacillus cereus Gram-positive, spore-forming microorganism At present three enterotoxins,

Disclosures Jeanne E. Poole, M.D. Results from the REPLACE Registry FINANCIAL

Different kinds of asthma, different kinds of therapies Friday 10 th November 2017 XXXIII

Mol2Net Ilex guayusa : A systematic review of its Traditional Uses, Chemical Constituents,

AnimeStudio Ghibli Mon 24 Feb 2020 02:40:35 PM CST

Physics of rare events: insights on Napoleon death. Ezio Previtali INFN Sez. Milano Bicocca

How do we know vaccines are safe ? A/Professor Michael Gold Head Allergy and Immunology Womens

Sambuz

Useful Links

Newsletter

Mail Us