Non-parametric causal models Robin J. Evans Thomas S. Richardson - PowerPoint PPT Presentation

Non-parametric causal models Robin J. Evans Thomas S. Richardson Oxford and Univ. of Washington UAI Tutorial 12th July 2015 1 / 44

Structure Part One: Causal DAGs with latent variables Part Two: Statistical Models arising from DAGs with latents 2 / 44

Outline for Part One Intervention distributions The general identification problem Tian’s ID Algorithm Fixing: generalizing marginalizing and conditioning Non-parametric constraints aka Verma constraints 3 / 44

Intervention distributions (I) Given a causal DAG G with distribution: � p ( V ) = p ( v | pa( v )) v ∈ V we wish to compute an intervention distribution via truncated factorization: � p ( V \ X | do( X = x )) = p ( v | pa( v )) . v ∈ V \ X 4 / 44

Example L X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) 5 / 44

Example L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) × p ( M | ˜ x ) p ( Y | L , M ) 5 / 44

Intervention distributions (II) Given a causal DAG G with distribution: � p ( V ) = p ( v | pa( v )) v ∈ V we wish to compute an intervention distribution via truncated factorization: � p ( V \ X | do( X = x )) = p ( v | pa( v )) . v ∈ V \ X Hence if we are interested in Y ⊂ V \ X then we simply marginalize: � � p ( Y | do( X = x )) = p ( v | pa( v )) . w ∈ V \ ( X ∪ Y ) v ∈ V \ X This is the ‘g-computation’ formula of Robins (1986). 6 / 44

Intervention distributions (II) Given a causal DAG G with distribution: � p ( V ) = p ( v | pa( v )) v ∈ V we wish to compute an intervention distribution via truncated factorization: � p ( V \ X | do( X = x )) = p ( v | pa( v )) . v ∈ V \ X Hence if we are interested in Y ⊂ V \ X then we simply marginalize: � � p ( Y | do( X = x )) = p ( v | pa( v )) . w ∈ V \ ( X ∪ Y ) v ∈ V \ X This is the ‘g-computation’ formula of Robins (1986). Note: p ( Y | do( X = x )) is a sum over a product of terms p ( v | pa( v )). 6 / 44

Example L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L , M ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m Note that p ( Y | do( X = ˜ x )) � = p ( Y | X = ˜ x ). 7 / 44

Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) 8 / 44

Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) 8 / 44

Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) 8 / 44

Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l ) l , m 8 / 44

Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l ) l , m � = p ( L = l ) p ( Y | L = l ) l 8 / 44

Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l ) l , m � = p ( L = l ) p ( Y | L = l ) l = p ( Y ) � = P ( Y | ˜ x ) since X �⊥ ⊥ Y . ‘Correlation is not Causation’. 8 / 44

Example with M unobserved L L X M Y X M Y � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m 9 / 44

Example with M unobserved L L X M Y X M Y � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m � = p ( L = l ) p ( M = m | ˜ x , L = l ) p ( Y | L = l , M = m , X = ˜ x ) l , m � = p ( L = l ) p ( Y , M = m | L = l , X = ˜ x ) l , m � = p ( L = l ) p ( Y | L = l , X = ˜ x ) . l ⇒ can find p ( Y | do( X = ˜ x )) even if M not observed. This is an example of the ‘back door formula’. 9 / 44

Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) 10 / 44

Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) � = p ( M = m | do( X = ˜ x )) p ( Y | do( M = m )) m 10 / 44

But with both L and M unobserved.... L X M Y ...we are out of luck! 11 / 44

But with both L and M unobserved.... L X M Y ...we are out of luck! Given P ( X , Y ), absent further assumptions we cannot distinguish: L X Y X M Y 11 / 44

General Identification Question Given: a latent DAG G ( O ∪ H ), where O are observed, H are hidden, and disjoint subsets X , Y ⊆ O . Q: Is p ( Y | do( X )) identified given p ( O )? 12 / 44

General Identification Question Given: a latent DAG G ( O ∪ H ), where O are observed, H are hidden, and disjoint subsets X , Y ⊆ O . Q: Is p ( Y | do( X )) identified given p ( O )? A: Provide either an identifying formula that is a function of p ( O ) or report that p ( Y | do( X )) is not identified. 12 / 44

Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) 13 / 44

Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) Whenever there is a path of the form y x · · · h 1 h k add y x 13 / 44

Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) Whenever there is a path of the form y x · · · h 1 h k add y x Whenever there is a path of the form y x · · · h 1 h k add y x 13 / 44

Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) Whenever there is a path of the form y x · · · h 1 h k add y x Whenever there is a path of the form y x · · · h 1 h k add y x Then remove all latent variables H from the graph. 13 / 44

ADMGs x z x z − → u project w t t y y 14 / 44

ADMGs x z x z − → u project w t t y y Latent projection leads to an acyclic directed mixed graph (ADMG) 14 / 44

ADMGs x z x z − → u project w t t y y Latent projection leads to an acyclic directed mixed graph (ADMG) Can read off independences with d/m-separation. The projection preserves the causal structure; Verma and Pearl (1992). 14 / 44

Non-parametric causal models Robin J. Evans Thomas S. Richardson - PowerPoint PPT Presentation

Non-parametric causal models Robin J. Evans Thomas S. Richardson Oxford and Univ. of Washington UAI Tutorial 12th July 2015 1 / 44 Structure Part One: Causal DAGs with latent variables Part Two: Statistical Models arising from DAGs with

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing

Randomized Experiments The goal of randomized experiments is to identify The causal

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Comparing covariate adjustment in interventional and observational studies Markus Kalisch,

Requirements for Medical Physics Applications Joseph Perl Geant4 Collaboration Workshop

Buildings & Grounds Committee June 11, 2015 Agenda 1. Report on the Pavilion gardens 2.

Two Optimal Strategies for Active Learning of Causal Models from Interventions Alain Hauser

REForm: A Data Capture Framework for Large-scale Interventional Studies with Survey Workflow

Counterfactual Reasoning in Algorithmic Fairness Ricardo Silva University College London and

High clinical utility the potential for fewer risky procedures and significant cost savings

Appropriately select and tailor imaging to

Non-parametric causal models Robin J. Evans Thomas S. Richardson - PowerPoint PPT Presentation

Non-parametric causal models Robin J. Evans Thomas S. Richardson Oxford and Univ. of Washington UAI Tutorial 12th July 2015 1 / 44 Structure Part One: Causal DAGs with latent variables Part Two: Statistical Models arising from DAGs with

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing

Randomized Experiments The goal of randomized experiments is to identify The causal

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

Parametric vs Nonparametric Models Parametric models assume some finite set of parameters .

Comparing covariate adjustment in interventional and observational studies Markus Kalisch,

Requirements for Medical Physics Applications Joseph Perl Geant4 Collaboration Workshop

Buildings &amp; Grounds Committee June 11, 2015 Agenda 1. Report on the Pavilion gardens 2.

Two Optimal Strategies for Active Learning of Causal Models from Interventions Alain Hauser

REForm: A Data Capture Framework for Large-scale Interventional Studies with Survey Workflow

Counterfactual Reasoning in Algorithmic Fairness Ricardo Silva University College London and

High clinical utility the potential for fewer risky procedures and significant cost savings

Appropriately select and tailor imaging to

Buildings & Grounds Committee June 11, 2015 Agenda 1. Report on the Pavilion gardens 2.