Efficient Least Squares for Estimating Total Causal Effects Richard - PowerPoint PPT Presentation

Efficient Least Squares for Estimating Total Causal Effects Richard Guo, Emilija Perkovi´ c Pacific Causal Inference Conference, 2020 Department of Statistics, University of Washington, Seattle 1

Highlights 2

Highlights • We consider estimating a total causal effect from observational data . 2

Highlights • We consider estimating a total causal effect from observational data . • We assume 2

Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. 2

Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. 2

Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. 2

Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. • We present a least squares estimator that is 2

Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. • We present a least squares estimator that is • Complete : applicable whenever the effect is identified, 2

Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. • We present a least squares estimator that is • Complete : applicable whenever the effect is identified, • Efficient : relative to a large class of estimators, which is the first of its kind in the literature ... 2

Causal DAG, linear SEM S A Z W Y T 3

Causal DAG, linear SEM S A Z W Y T Suppose D is the underlying causal DAG. D is unknown . 3

Causal DAG, linear SEM S A Z W Y T Suppose D is the underlying causal DAG. D is unknown . Suppose data is generated by a linear structural equation model (SEM) � X v = γ uv X u + ǫ u , E ǫ u = 0 , 0 < var ǫ u < ∞ . u : u → v 3

Causal DAG, linear SEM S A Z W Y T Suppose D is the underlying causal DAG. D is unknown . Suppose data is generated by a linear structural equation model (SEM) � X v = γ uv X u + ǫ u , E ǫ u = 0 , 0 < var ǫ u < ∞ . u : u → v Under causal sufficiency, the errors are mutually independent (no i ↔ j in the path diagram). 3

Total effect Suppose we want to estimate the total (causal) effect of A on Y . 4

Total effect Suppose we want to estimate the total (causal) effect of A on Y . S A Z W Y T 4

Total effect Suppose we want to estimate the total (causal) effect of A on Y . S A Z W Y T ☞ The total effect τ AY is defined as the slope of x a �→ E [ X Y | do( X A = x a )], given by a sum-product of Wright (1934): ∂ τ AY = E [ X Y | do( X A = x a )] = ( γ AZ γ ZW + γ AW ) γ WY . ∂ x a 4

Total effect Suppose we want to estimate the total (causal) effect of A on Y . S A Z W Y T ☞ The total effect τ AY is defined as the slope of x a �→ E [ X Y | do( X A = x a )], given by a sum-product of Wright (1934): ∂ τ AY = E [ X Y | do( X A = x a )] = ( γ AZ γ ZW + γ AW ) γ WY . ∂ x a Here we consider point intervention ( | A | = 1) for simplicity. For a joint intervention ( | A | > 1), total effect can be similarly defined. 4

Markov equivalence, CPDAG 5

Markov equivalence, CPDAG Without making further assumptions, the causal DAG D can only be identified from observed distribution up to a Markov equivalence class . 5

Markov equivalence, CPDAG Without making further assumptions, the causal DAG D can only be identified from observed distribution up to a Markov equivalence class . The Markov equivalence class of D is uniquely represented by a CPDAG/essential graph C . S A Z W Y T 5

Markov equivalence, CPDAG Without making further assumptions, the causal DAG D can only be identified from observed distribution up to a Markov equivalence class . The Markov equivalence class of D is uniquely represented by a CPDAG/essential graph C . S A Z W Y T ☞ Knowing only C is often insufficient to identify the total effect. 5

Identifiability from a partially directed graph Theorem (Perkovi´ c, 2020) The total effect τ AY is identified from a maximally oriented partially directed acyclic graph G if and only if there is no proper, possibly causal path from A to Y in G that starts with an undirected edge. 6

Identifiability from a partially directed graph Theorem (Perkovi´ c, 2020) The total effect τ AY is identified from a maximally oriented partially directed acyclic graph G if and only if there is no proper, possibly causal path from A to Y in G that starts with an undirected edge. S A Z W Y T 6

Identifiability from a partially directed graph Theorem (Perkovi´ c, 2020) The total effect τ AY is identified from a maximally oriented partially directed acyclic graph G if and only if there is no proper, possibly causal path from A to Y in G that starts with an undirected edge. S A Z W Y T ☞ In the unidentified case, see also the IDA algorithms (Maathuis, Kalisch, and B¨ uhlmann, 2009; Nandy, Maathuis, and Richardson, 2017) that enumerates possible total effects. 6

Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. 7

Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . 7

Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T 7

Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T The green orientations are further implied by the rules of Meek (1995). 7

Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T The green orientations are further implied by the rules of Meek (1995). ☞ In this example, τ AY is identified from the resulting maximally oriented partially directed acyclic graph (MPDAG) G . 7

Adjustment estimator Our task is to estimate τ AY from n iid observational sample generated by a linear SEM associated with causal DAG D , given that D ∈ [ G ] for MPDAG G , τ AY is identifiable from G . S A Z W Y T MPDAG G 8

Adjustment estimator Our task is to estimate τ AY from n iid observational sample generated by a linear SEM associated with causal DAG D , given that D ∈ [ G ] for MPDAG G , τ AY is identifiable from G . S A Z W Y T MPDAG G τ adj ☞ Adjustment estimator : ˆ AY is the least squares coefficient of A from Y ∼ A + S . 8

Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. 9

Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T 9

Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator 9

Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator • may not exist when | A | > 1. 9

Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator • may not exist when | A | > 1. • may not be unique. 9

Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator • may not exist when | A | > 1. • may not be unique. • The most efficient adjustment estimator is recently characterized by Henckel, Perkovi´ c, and Maathuis (2019) and Witte et al. (2020). 9

Efficient Least Squares for Estimating Total Causal Effects Richard - PowerPoint PPT Presentation

Efficient Least Squares for Estimating Total Causal Effects Richard Guo, Emilija Perkovi c Pacific Causal Inference Conference, 2020 Department of Statistics, University of Washington, Seattle 1 Highlights 2 Highlights We consider

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

Efficient Determination of the Hyperparameter via L-curve in Large Scale Least Squares and Total

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Linear Least Squares I Steve Marschner Cornell CS 322 Cornell CS 322 Linear Least Squares I 1

Moving Least Squares Outline The Approximation Power of Moving Least- Squares D. Levin

The Chi-squared Distribution of the Regularized Least Squares Functional for Regularization

Non linear Least Squares Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Geometry of Least Squares 2 Least squares from the

Part C Instruction scheduling Instruction scheduling character stream token stream

(Weighted) Regular DAG Languages Properties and Algorithms WATA 2018 F. Drewes (joint work with

All Pairs Shortest Paths Carola Wenk Slides courtesy of Charles Leiserson y with changes by

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall http:/

Data Structures in Java Session 16 Instructor: Bert Huang

BUSINESS DASHBOARDS using Bonobo, Airflow and Grafana makersquad.fr Romain Dorgueil

A Quest for Unified, Global View Parallel Programming Models for Our Future Kenjiro Taura

Relational Data Hierarchies CSC444 Why hierarchies?