Evaluating Causal Models by Comparing Interventional Distributions - PowerPoint PPT Presentation

Evaluating Causal Models by Comparing Interventional Distributions Dan Garant and David Jensen Knowledge Discovery Laboratory College of Information and Computer Sciences University of Massachusetts Amherst

Findings • Existing approaches to evaluation are strictly structural, and do not characterize the full causal inference pipeline • Statistical distances can be used to evaluate interventional distribution quality • Evaluation with statistical distance can lead to different conclusions about algorithmic performance 2

Overview • Causal Graphical Models • Current Approaches to Evaluation • Evaluation with Statistical Distance • Comparative Results 3

Causal Graphical Models N (0 , 1) N (0 , 1) N (0 , 1) N (0 , 1) X Y W Z U ( X − 1 , X + 1) U ( X − 1 , X + 1) N ( X + 0 . 1 Y, 1) N ( X + 0 . 1 Y, 1) 5

Causal Graphical Models 10 N (0 , 1) N (0 , 1) X Y W Z N ( X + 0 . 1 Y, 1) N ( X + 0 . 1 Y, 1) U ( X − 1 , X + 1) U ( X − 1 , X + 1) 6

Use Cases • Qualitative assessment of causal structure   (does intervening on X influence Z?) • Estimation of interventional distributions P ( Z | do( X = 10)) 7

Use Cases • Qualitative assessment of causal structure   (does intervening on X influence Z?) • Estimation of interventional distributions P ( Z | do( X = 10)) 8

Structure Learning • PC (Spirtes et al. 2000): Use conditional independence tests to derive constraints on possible structure • GES (Chickering 2002): Perform local updates in order to maximize a global score on structures, maximizing structure likelihood • MMHC (Tsamardinos et al. 2006): Combines constraint-based and score-based approaches Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov), 507-554. Spirtes, P., Glymour, C. N., & Scheines, R. (2000). Causation, prediction, and search. MIT press. 9 Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine learning, 65(1), 31-78.

Need for Quantitative Evaluation • How well do these algorithms work in practice? Under what circumstances do they perform better or worse? • Which algorithm should I use? Does performance depend on domain characteristics? 10

Structural Hamming Distance (SHD) True Graph Under-specification, SHD=1 X Y X Y W Z W Z Over-specification, SHD=1 Mis-orientation, SHD=1/2 X Y X Y W Z W Z 12

Structural Intervention Distance (SID) • Graph mis-specification is not fundamentally related to quality of a causal model (Peters & Bühlmann 2015) • Including superfluous edges does not necessarily bias a causal model • Reversing or omitting edges can potentially induce bias in many interventional distributions • Structural intervention distance: Count number of mis- specified pairwise interventional distributions Peters, J., & Bühlmann, P. (2015). Structural intervention distance for evaluating causal graphs. Neural computation. 13

SHD vs SID True Graph Under-specification, SHD=1, SID=1 X Y X Y P ( Z | do ( X )) W Z W Z Over-specification, SHD=1, SID=0 Mis-orientation, SID=1/2, SID=3 X Y P ( Y | do ( X )) X Y P ( Z | do ( Y )) P ( Y | do ( Z )) W Z W Z 14

Problems with Structural Distances • Structural measures fail to characterize the full causal inference pipeline. To reach an interventional distribution, we also need to learn parameters and perform inference • Some interventional distributions may be more biased than others • In finite sample settings, variance matters too. A biased model with low variance may be better than an unbiased model with high variance 15

Statistical Effects of Model Errors True Graph N (0 , 1) N (0 , 1) X Y W Z U ( X − 1 , X + 1) N ( X + 0 . 1 Y, 1) Under-specification, SHD=1, SID=2 Under-specification, SHD=1, SID=2 X Y X Y W Z W Z 16

Statistical Effects of Model Errors True Graph Over-specification, SHD=2, SID=0 N (0 , 1) N (0 , 1) X Y X Y W Z W Z N ( X + 0 . 1 Y, 1) U ( X − 1 , X + 1) 17

Interventional Distribution Quality • Ultimately, we care about the quality of interventional distributions rather than only the quality of the graph structure • To evaluate distributions, we need: • Parameterized models • Inference algorithms • A measure of distributional accuracy 19

Total Variation Distance P ,T = t ( O ) = 1 � P ( O = o | do ( T = t )) − ˆ X � � P ( O = o | do ( T = t )) TV P, ˆ � 2 o ∈ Ω ( O ) 20

Enumerating Distributions • To evaluate an entire DAG, we need to enumerate pairs of treatments and outcomes TV DAG ( G, ˆ X G ) = ⇤ ( V ) TV P G ,P ˆ G ,v 0 = v 0 V ∈ V ( G ) ,V 0 ∈ V ( G ) \{ V } • Performing these inferences is expensive, but these are precisely the inferences that must be performed to use the model 21

Overview • Causal Graphical Models • Current Approaches to Evaluation • Evaluation with Statistical Distance • Comparative Experiments 22

Synthetic Domains • Logistic: Binary data, each node is a logistic function of its parents • Linear-Gaussian: Real-valued data, values for each node are normally distributed around a linear combination of parent values • Dirichlet: Discrete data, CPD for each node is sampled from a Dirichlet distribution determined by parent values 23

Software Domains • We instrumented and performed factorial experiments on three software domains: • Postgres • Java Development Kit • Web platforms • Then, a biased sampling biased sampling routine is used to transform experimental data into observational data • Ground-truth interventional distributions are computed on experimental data and compared to the distributions estimated from a learned model structure 24

Software Domains ID T O C Observational 1 1 5.7 L Sampling C 1 0 3.2 L 2 1 4.5 H ID T O C Structure Learning & 2 0 4.3 H 1 0 3.2 L T Parameterization 3 1 6.2 H 2 1 4.5 H 3 0 1.5 H 3 1 6.2 H O 4 1 5.3 L 4 1 5.3 L 4 0 4.6 L … Parameterized DAG … Observational Data Interventional Data Compute Interventional Estimate Interventional Evaluation Distribution Distribution 25

Over-specification and Under- specification • We created DAG models derived from the true structure of our real software domains: • Over-specified: The parent set of each outcome is a strict superset of the true parent set • Under-specified: The parent set of each outcome is a strict subset of the true parent set • Then, we evaluated these models against the ground truth structure and interventional distribution 26

Relative Performance of Algorithms SID SHD TV 27

Revisiting Synthetic Data Generation 28

Conclusions • Existing approaches to evaluation are strictly structural, and do not characterize the full causal inference pipeline • Statistical distances can be used to evaluate interventional distribution quality • Evaluation with statistical distance can lead to different conclusions about algorithmic performance 29

Evaluating Causal Models by Comparing Interventional Distributions - PowerPoint PPT Presentation

Evaluating Causal Models by Comparing Interventional Distributions Dan Garant and David Jensen Knowledge Discovery Laboratory College of Information and Computer Sciences University of Massachusetts Amherst Findings Existing approaches to

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Outline Evaluating Models of Natural Image Patches Evaluating Models Comparing Whitening

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Comparing covariate adjustment in interventional and observational studies Markus Kalisch,

Learning Objectives Who is an Interventional Pulmonologist? What are the tools? Updates in

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Randomized Experiments The goal of randomized experiments is to identify The causal

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing

Robert Kelly Recap Important Policy Question How has introduction of a Hard LTV Limit

THE ESSENTIAL ROLE OF EXTERNAL AND CONSTRUCT VALIDITY FOR CAUSAL IDENTIFICATION Kevin Esterling,

in Hispanic Communities Fr. John Guthrie Associate Director, Secretariat of Clergy,

Board of Education Meeting April 13, 2015 Master Facilities Plan Final Report Emerson

Effective Continuing Professional Development and Learning (CPDL) A review of reviews Steve

Operational criteria of causality for observational road safety evaluation studies TRB-paper

Integrated Science Assessment for Carbon Monoxide (1 st External Review Draft) Presentation to

Rural Challenges and Opportunities 1 Rural Population Characteristics 2 Rural Workforce