efficient least squares for estimating total causal
play

Efficient Least Squares for Estimating Total Causal Effects Richard - PowerPoint PPT Presentation

Efficient Least Squares for Estimating Total Causal Effects Richard Guo, Emilija Perkovi c Pacific Causal Inference Conference, 2020 Department of Statistics, University of Washington, Seattle 1 Highlights 2 Highlights We consider


  1. Efficient Least Squares for Estimating Total Causal Effects Richard Guo, Emilija Perkovi´ c Pacific Causal Inference Conference, 2020 Department of Statistics, University of Washington, Seattle 1

  2. Highlights 2

  3. Highlights • We consider estimating a total causal effect from observational data . 2

  4. Highlights • We consider estimating a total causal effect from observational data . • We assume 2

  5. Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. 2

  6. Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. 2

  7. Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. 2

  8. Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. • We present a least squares estimator that is 2

  9. Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. • We present a least squares estimator that is • Complete : applicable whenever the effect is identified, 2

  10. Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. • We present a least squares estimator that is • Complete : applicable whenever the effect is identified, • Efficient : relative to a large class of estimators, which is the first of its kind in the literature ... 2

  11. Causal DAG, linear SEM S A Z W Y T 3

  12. Causal DAG, linear SEM S A Z W Y T Suppose D is the underlying causal DAG. D is unknown . 3

  13. Causal DAG, linear SEM S A Z W Y T Suppose D is the underlying causal DAG. D is unknown . Suppose data is generated by a linear structural equation model (SEM) � X v = γ uv X u + ǫ u , E ǫ u = 0 , 0 < var ǫ u < ∞ . u : u → v 3

  14. Causal DAG, linear SEM S A Z W Y T Suppose D is the underlying causal DAG. D is unknown . Suppose data is generated by a linear structural equation model (SEM) � X v = γ uv X u + ǫ u , E ǫ u = 0 , 0 < var ǫ u < ∞ . u : u → v Under causal sufficiency, the errors are mutually independent (no i ↔ j in the path diagram). 3

  15. Total effect Suppose we want to estimate the total (causal) effect of A on Y . 4

  16. Total effect Suppose we want to estimate the total (causal) effect of A on Y . S A Z W Y T 4

  17. Total effect Suppose we want to estimate the total (causal) effect of A on Y . S A Z W Y T ☞ The total effect τ AY is defined as the slope of x a �→ E [ X Y | do( X A = x a )], given by a sum-product of Wright (1934): ∂ τ AY = E [ X Y | do( X A = x a )] = ( γ AZ γ ZW + γ AW ) γ WY . ∂ x a 4

  18. Total effect Suppose we want to estimate the total (causal) effect of A on Y . S A Z W Y T ☞ The total effect τ AY is defined as the slope of x a �→ E [ X Y | do( X A = x a )], given by a sum-product of Wright (1934): ∂ τ AY = E [ X Y | do( X A = x a )] = ( γ AZ γ ZW + γ AW ) γ WY . ∂ x a Here we consider point intervention ( | A | = 1) for simplicity. For a joint intervention ( | A | > 1), total effect can be similarly defined. 4

  19. Markov equivalence, CPDAG 5

  20. Markov equivalence, CPDAG Without making further assumptions, the causal DAG D can only be identified from observed distribution up to a Markov equivalence class . 5

  21. Markov equivalence, CPDAG Without making further assumptions, the causal DAG D can only be identified from observed distribution up to a Markov equivalence class . The Markov equivalence class of D is uniquely represented by a CPDAG/essential graph C . S A Z W Y T 5

  22. Markov equivalence, CPDAG Without making further assumptions, the causal DAG D can only be identified from observed distribution up to a Markov equivalence class . The Markov equivalence class of D is uniquely represented by a CPDAG/essential graph C . S A Z W Y T ☞ Knowing only C is often insufficient to identify the total effect. 5

  23. Identifiability from a partially directed graph Theorem (Perkovi´ c, 2020) The total effect τ AY is identified from a maximally oriented partially directed acyclic graph G if and only if there is no proper, possibly causal path from A to Y in G that starts with an undirected edge. 6

  24. Identifiability from a partially directed graph Theorem (Perkovi´ c, 2020) The total effect τ AY is identified from a maximally oriented partially directed acyclic graph G if and only if there is no proper, possibly causal path from A to Y in G that starts with an undirected edge. S A Z W Y T 6

  25. Identifiability from a partially directed graph Theorem (Perkovi´ c, 2020) The total effect τ AY is identified from a maximally oriented partially directed acyclic graph G if and only if there is no proper, possibly causal path from A to Y in G that starts with an undirected edge. S A Z W Y T ☞ In the unidentified case, see also the IDA algorithms (Maathuis, Kalisch, and B¨ uhlmann, 2009; Nandy, Maathuis, and Richardson, 2017) that enumerates possible total effects. 6

  26. Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. 7

  27. Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . 7

  28. Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T 7

  29. Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T 7

  30. Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T The green orientations are further implied by the rules of Meek (1995). 7

  31. Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T The green orientations are further implied by the rules of Meek (1995). ☞ In this example, τ AY is identified from the resulting maximally oriented partially directed acyclic graph (MPDAG) G . 7

  32. Adjustment estimator Our task is to estimate τ AY from n iid observational sample generated by a linear SEM associated with causal DAG D , given that D ∈ [ G ] for MPDAG G , τ AY is identifiable from G . S A Z W Y T MPDAG G 8

  33. Adjustment estimator Our task is to estimate τ AY from n iid observational sample generated by a linear SEM associated with causal DAG D , given that D ∈ [ G ] for MPDAG G , τ AY is identifiable from G . S A Z W Y T MPDAG G τ adj ☞ Adjustment estimator : ˆ AY is the least squares coefficient of A from Y ∼ A + S . 8

  34. Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. 9

  35. Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T 9

  36. Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator 9

  37. Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator • may not exist when | A | > 1. 9

  38. Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator • may not exist when | A | > 1. • may not be unique. 9

  39. Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator • may not exist when | A | > 1. • may not be unique. • The most efficient adjustment estimator is recently characterized by Henckel, Perkovi´ c, and Maathuis (2019) and Witte et al. (2020). 9

Recommend


More recommend